AI Prompt Engineering · Lesson

Harmlessness vs Helpfulness Tension

Navigating over-refusal: prompts that balance safety with utility.

The Core Tension

Every AI safety system faces a fundamental tension: making a model safer by refusing more often also makes it less helpful to legitimate users.

A model that refuses to discuss anything medical will never give dangerous health advice — but it also won't help nurses, doctors, or patients with genuinely useful information. Calibration is the goal: refuse when you should, help when you should.

Over-Refusal: The Real Problem

Over-refusal occurs when a model declines to answer benign requests because they superficially resemble harmful ones. Examples:

Refusing to explain how diseases spread (sounds dangerous, is actually public health education)
Refusing to write a villain character (fiction, not endorsement)
Refusing to explain lock-picking for a locksmith apprentice

Over-refusal is not just annoying — it makes the model untrustworthy and pushes users to less safe alternatives.

All lessons in this course

← Back to AI Prompt Engineering