Measuring Robustness
Scoring resistance to attacks.
Robustness as a Measurable Quantity
Robustness is the system's resistance to adversarial input, expressed as numbers you can track, compare, and gate on. 'It seems safe' is not measurement; an attack-success rate with a confidence interval is.
This lesson turns red-team findings into rigorous metrics.
Attack Success Rate
The headline metric is Attack Success Rate (ASR): the fraction of attack cases that defeat your defenses. Lower is better. Report it overall and broken down by category and technique so you know where you are weak.
def asr(results):
breaks = sum(1 for r in results if not r['safe'])
return breaks / len(results)
# also compute per-category ASR for diagnosisAll lessons in this course
- LLM Red-Teaming Basics
- Jailbreak Techniques
- Building an Attack Suite
- Measuring Robustness