Principal Component Regression in Scikit-Learn

Learn how to apply Principal Component Regression (PCR) using scikit-learn and Python, with a detailed example to illustrate the concept| …

Updated June 21, 2023

|Learn how to apply Principal Component Regression (PCR) using scikit-learn and Python, with a detailed example to illustrate the concept|

How to Apply Principal Component Regression in Scikit-Learn

Definition of the Concept

Principal Component Regression (PCR) is a dimensionality reduction technique that combines the concepts of Principal Component Analysis (PCA) and Linear Regression. It’s an extension of PCA where we use the principal components as predictors instead of the original variables.

Step-by-Step Explanation

Here’s how to apply PCR using scikit-learn:

Step 1: Import necessary libraries

import numpy as np
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

We first import the necessary libraries, including NumPy for numerical operations and scikit-learn’s PCA and Linear Regression classes.

Step 2: Prepare your data

# Create some sample data (you would replace this with your own dataset)
np.random.seed(0)
X = np.random.rand(100, 10) # Features
y = np.random.randint(0, 3, size=100) # Target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next, we create some sample data and split it into a training set and a test set using train_test_split.

Step 3: Apply PCA to reduce dimensionality

# Define the number of principal components to keep
n_components = 5

# Create a PCA instance with the specified n_components
pca = PCA(n_components=n_components)

# Fit the PCA model to the training data and transform both the training and testing data
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

Now, we apply PCA to reduce the dimensionality of our data by retaining only 5 principal components. We fit the PCA model to the training data using fit_transform, which also transforms the data.

Step 4: Apply Linear Regression

# Create a LinearRegression instance and fit it to the transformed training data
lr = LinearRegression()
lr.fit(X_train_pca, y_train)

Finally, we create a Linear Regression model and fit it to the transformed training data using fit.

Step 5: Evaluate the model

# Make predictions on the transformed testing data
y_pred = lr.predict(X_test_pca)

# Print the coefficients of the linear regression model
print("Coefficients:", lr.coef_)

# Evaluate the model's performance (e.g., using mean squared error)
mse = np.mean((y_pred - y_test)**2)
print(f"Mean Squared Error: {mse:.2f}")

We make predictions on the transformed testing data, print the coefficients of the linear regression model, and evaluate its performance using mean squared error.

This step-by-step guide demonstrates how to apply Principal Component Regression (PCR) in scikit-learn.