AI Prompt Engineering · Lesson

Input and Output Filtering

Blocking unsafe content.

Filtering as the First Line

Filtering inspects content and decides allow, block, or transform. Input filtering protects the model and your cost budget; output filtering protects the user and your reputation. Both rely on a layered mix of fast deterministic checks and slower semantic classifiers.

Prompt-Injection Detection

The signature input threat is prompt injection: text that tries to override your instructions ('ignore previous instructions and...'). Detection combines pattern heuristics with a classifier, and is reinforced by clearly delimiting untrusted content so instructions inside it are treated as data.

SUSPECT = ['ignore previous', 'disregard above', 'system prompt',
           'you are now', 'reveal your instructions']
def injection_score(text):
    t = text.lower()
    return sum(p in t for p in SUSPECT)

All lessons in this course

What Are Guardrails
Input and Output Filtering
Schema and Rule Validators
Self-Critique Validation

← Back to AI Prompt Engineering