LLM Red-Teaming Basics
Probing for failures.
What Red-Teaming Means for LLMs
Red-teaming is the disciplined practice of probing a system for failures before adversaries do. For LLMs, it means systematically attacking your prompts, guardrails, and tools to surface unsafe, incorrect, or policy-violating behavior.
It is offensive testing in service of defense, and the goal is reproducible findings, not one-off clever exploits.
Threat Model First
Before attacking, define what you are protecting and from whom:
- Assets: secrets, user data, privileged tool actions, brand safety.
- Adversaries: curious users, scammers, automated abuse, insiders.
- Capabilities: can they see system prompts, control retrieved docs, chain tool calls?
A finding only matters relative to a threat model.
All lessons in this course
- LLM Red-Teaming Basics
- Jailbreak Techniques
- Building an Attack Suite
- Measuring Robustness