Machine Learning Academy · Lesson

Laplace Smoothing and Zero-Probability Problem

Learners will reproduce the zero-probability failure on unseen words and see how Laplace smoothing prevents the model from assigning zero probability.

The Zero-Probability Catastrophe

Naive Bayes computes the posterior probability of a class by multiplying the likelihoods of all features: P(class|features) ∝ P(class) * product of P(feature_i | class). If any single feature has zero probability given a class — because it never appeared in training data for that class — the entire product is zero, regardless of all other features. This means a single unseen word makes the classifier unable to distinguish between classes for that document. This is the zero-probability problem, and it is particularly severe for text data where the vocabulary at test time almost always includes words not seen in training.

import numpy as np

# Training: 'bitcoin' never appeared in spam class
# Test document: 'buy bitcoin now cheap'

words_in_test = ['buy', 'bitcoin', 'now', 'cheap']

# Training probabilities (hypothetical)
P_word_given_spam = {'buy': 0.3, 'bitcoin': 0.0, 'now': 0.2, 'cheap': 0.4}

# Multiply likelihoods
product = 1.0
for word in words_in_test:
    prob = P_word_given_spam.get(word, 0.0)
    product *= prob
    print(f'After {word}: product = {product}')

print('Final P(features|spam) =', product)  # ZERO -- catastrophic!

Why Zero Probability Breaks the Model

When P(features|class) = 0 for multiple classes simultaneously (which happens when unseen words exist), the classifier cannot distinguish them — all posterior probabilities are 0. When only some classes have zero probability, the classifier is forced toward the remaining classes, which may be wrong. This is not just a numerical inconvenience — it is a fundamental model failure. In log-space, a zero probability becomes negative infinity: log(0) = -infinity. Summing with finite values still gives -infinity, so the log-probability is completely dominated by this single zero, ignoring all other evidence.

import numpy as np

# In log-space: log(0) = -inf destroys the sum
log_probs = [np.log(0.3), np.log(0.0), np.log(0.2), np.log(0.4)]

for word, lp in zip(['buy', 'bitcoin', 'now', 'cheap'], log_probs):
    print(f'log P({word}|spam) = {lp}')

log_posterior_spam = sum(log_probs)
print(f'\nLog P(spam|doc) = {log_posterior_spam}')  # -inf
print('Prediction is dominated by the single zero probability!')

All lessons in this course

Bayes' Theorem in Plain Language
Bag of Words: CountVectorizer and TfidfVectorizer
Training a Multinomial Naive Bayes Classifier
Laplace Smoothing and Zero-Probability Problem

← Back to Machine Learning Academy