0PricingLogin
Machine Learning Academy · Lesson

Training Linear Regression with scikit-learn

Learners will use sklearn.linear_model.LinearRegression to fit a model on training data, make predictions, and inspect the learned coefficients.

scikit-learn's LinearRegression API

Scikit-learn's LinearRegression class provides a clean, consistent API for fitting ordinary least-squares linear regression. Like all scikit-learn estimators, it follows the same pattern: instantiate, fit, predict.

Under the hood, it uses the Normal Equation (or a numerically stable SVD decomposition for large feature matrices) to find the exact optimal weights in a single computation — no iterative training loop is needed. This makes it extremely fast even on large datasets with many features.

from sklearn.linear_model import LinearRegression
import numpy as np

# Instantiate with optional parameters
model = LinearRegression(
    fit_intercept=True,   # default: add bias term
    copy_X=True,          # don't modify input arrays
    n_jobs=-1             # use all CPU cores
)

print(model)  # shows default parameters

Preparing the Data

Before training, your data must be in the correct shape. scikit-learn expects:

  • X — a 2D array of shape (n_samples, n_features)
  • y — a 1D array of shape (n_samples,)

The most common shape error is passing a 1D array for X when you have one feature. The fix is to reshape with .reshape(-1, 1). Always split into training and test sets before any fitting — evaluation on training data gives misleadingly perfect scores.

import numpy as np
from sklearn.model_selection import train_test_split

# Simulate housing data
np.random.seed(42)
sqft = np.random.uniform(500, 3000, 200)
price = 150 * sqft + 50000 + np.random.normal(0, 25000, 200)

# Reshape X to 2D (n_samples, n_features)
X = sqft.reshape(-1, 1)  # shape (200, 1)
y = price                  # shape (200,)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print('Train size:', X_train.shape, '| Test size:', X_test.shape)

All lessons in this course

  1. The Equation of a Line: Slope, Intercept, and Predictions
  2. Cost Functions and Least Squares
  3. Training Linear Regression with scikit-learn
  4. Multiple Linear Regression and Feature Importance
← Back to Machine Learning Academy