Training Linear Regression with scikit-learn
Learners will use sklearn.linear_model.LinearRegression to fit a model on training data, make predictions, and inspect the learned coefficients.
scikit-learn's LinearRegression API
Scikit-learn's LinearRegression class provides a clean, consistent API for fitting ordinary least-squares linear regression. Like all scikit-learn estimators, it follows the same pattern: instantiate, fit, predict.
Under the hood, it uses the Normal Equation (or a numerically stable SVD decomposition for large feature matrices) to find the exact optimal weights in a single computation — no iterative training loop is needed. This makes it extremely fast even on large datasets with many features.
from sklearn.linear_model import LinearRegression
import numpy as np
# Instantiate with optional parameters
model = LinearRegression(
fit_intercept=True, # default: add bias term
copy_X=True, # don't modify input arrays
n_jobs=-1 # use all CPU cores
)
print(model) # shows default parametersPreparing the Data
Before training, your data must be in the correct shape. scikit-learn expects:
- X — a 2D array of shape
(n_samples, n_features) - y — a 1D array of shape
(n_samples,)
The most common shape error is passing a 1D array for X when you have one feature. The fix is to reshape with .reshape(-1, 1). Always split into training and test sets before any fitting — evaluation on training data gives misleadingly perfect scores.
import numpy as np
from sklearn.model_selection import train_test_split
# Simulate housing data
np.random.seed(42)
sqft = np.random.uniform(500, 3000, 200)
price = 150 * sqft + 50000 + np.random.normal(0, 25000, 200)
# Reshape X to 2D (n_samples, n_features)
X = sqft.reshape(-1, 1) # shape (200, 1)
y = price # shape (200,)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
print('Train size:', X_train.shape, '| Test size:', X_test.shape)All lessons in this course
- The Equation of a Line: Slope, Intercept, and Predictions
- Cost Functions and Least Squares
- Training Linear Regression with scikit-learn
- Multiple Linear Regression and Feature Importance