Outlier Detection and Removal
Z-score method, IQR method, isolation forest concept, handling outliers in practice.
What is an Outlier?
An outlier is a value far from the rest of the data. It may be a genuine extreme, a data-entry error, or a sensor glitch. Outliers can distort means, variances, and models, so detect them deliberately.
The Z-Score Method
A z-score measures how many standard deviations a value sits from the mean: z = (x - mean) / std. A common rule flags any point with |z| > 3 as an outlier.
import numpy as np
data = np.array([10, 12, 11, 13, 12, 100])
z = (data - data.mean()) / data.std()
print(np.round(z, 2))All lessons in this course
- Outlier Detection and Removal
- Encoding Categorical Variables
- Feature Scaling: Normalization and Standardization
- Building Preprocessing Pipelines