Escalation & Metrics Anti-Patterns
Sentiment escalation, confidence scores, aggregate-only metrics.
Two Silent Failure Modes
Production agents rarely fail loudly. They fail by handing off to humans for the wrong reasons and by reporting health metrics that hide real damage.
This lesson dissects two exam-favorite anti-patterns: escalation driven by sentiment or self-rated confidence, and aggregate-only accuracy metrics. Both feel reasonable, both pass a demo, and both quietly erode trust at scale.
Architect-grade systems escalate on defensible triggers and measure performance stratified by what actually matters.
What a GOOD Escalation Trigger Looks Like
Escalation to a human is a privileged action. It should fire on triggers you can defend in an audit:
- Explicit human request — the customer asks for a person, you escalate immediately.
- Policy gaps — no rule covers this case.
- No progress — repeated attempts have not resolved the issue.
- Threshold violations — e.g. a refund exceeds an allowed limit.
Each of these maps to a concrete, observable fact in the conversation or the tool results — not to a guess about how the user feels.
def should_escalate(turn):
if turn.customer_requested_human:
return True # explicit request -> escalate now
if turn.no_matching_policy:
return True # policy gap
if turn.attempts >= turn.max_attempts and not turn.resolved:
return True # no progress after real attempts
if turn.refund_amount > REFUND_LIMIT:
return True # threshold violation
return FalseAll lessons in this course
- Loop & Orchestration Anti-Patterns
- Tool & Error Anti-Patterns
- Prompt & Review Anti-Patterns
- Escalation & Metrics Anti-Patterns