Pandas & NumPy Academy · Lesson

Dropping Missing Values

Remove rows or columns containing NaN with dropna(), controlling the threshold and subset of columns considered.

When to Drop Missing Values?

Dropping rows with missing values is the simplest imputation strategy, but it is only valid when the data is Missing Completely At Random (MCAR) — meaning the probability of a value being missing has nothing to do with the missing value itself or any other variable. If missing data is systematic (e.g., low-income respondents skip the salary field), dropping it introduces bias. Always investigate the missingness pattern before deciding to drop.

import pandas as pd
import numpy as np

# Example: randomly missing salary data (MCAR-like)
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave'],
    'salary': [50000, np.nan, 70000, np.nan]
})
print('Before drop:', df.shape)  # (4, 2)
print(df)

dropna() — Basic Usage

DataFrame.dropna() removes any row that contains at least one NaN value by default. It returns a new DataFrame; the original is unchanged unless you pass inplace=True. For small to medium datasets this default behaviour is often acceptable as a quick data cleaning first pass.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'a': [1, np.nan, 3, np.nan],
    'b': [10, 20, np.nan, 40],
    'c': [100, 200, 300, 400]
})

cleaned = df.dropna()
print(cleaned)
#      a     b    c
# 0  1.0  10.0  100

print('Original shape:', df.shape)    # (4, 3)
print('Cleaned shape:', cleaned.shape) # (1, 3)

All lessons in this course

← Back to Pandas & NumPy Academy