CAI Principles and Critique Prompts
Anthropic's Constitutional AI: self-critique based on harmlessness principles.
What Is Constitutional AI?
Constitutional AI (CAI) is Anthropic's method for training AI systems to be helpful, harmless, and honest using a set of written principles — a constitution.
Rather than relying solely on human labelers to flag harmful outputs, CAI has the model critique and revise its own responses against the constitution. This is both more scalable and more consistent.
The Constitution: Written Principles
The CAI constitution is a list of principles the model must follow. Anthropic's published constitution includes principles derived from:
- The UN Declaration of Human Rights
- Apple's App Store guidelines
- Anthropic's internal guidelines on harm avoidance
Example principle: "Choose the response that is least likely to contain harmful, unethical, racist, sexist, toxic, dangerous, or illegal content."
All lessons in this course
- CAI Principles and Critique Prompts
- Self-Critique and Revision Patterns
- Harmlessness vs Helpfulness Tension
- Implementing CAI in Applications