Fallback Providers and Circuit Breakers
Build a provider cascade that automatically fails over from OpenAI to Anthropic to a local model when the primary provider is slow or unavailable, using a circuit breaker pattern.
Single Provider Risk
Relying on a single LLM provider creates a single point of failure. OpenAI has experienced outages that took minutes to hours to resolve. If your entire application depends on GPT-4o being available, any provider incident immediately translates to user-facing downtime. A fallback provider strategy maintains service continuity by routing to alternative providers when the primary fails.
Defining a Provider Cascade
A provider cascade is an ordered list of providers and models tried in sequence. When the primary fails or times out, the system automatically tries the next provider. A typical cascade might be: OpenAI GPT-4o → Anthropic Claude 3.5 Sonnet → a locally deployed Llama model. Each level is a fallback with the local model serving as the last resort that cannot go down.
from dataclasses import dataclass
from typing import Optional
@dataclass
class Provider:
name: str
base_url: Optional[str]
api_key_env: str
model: str
priority: int # lower = higher priority
CASCADE = [
Provider('openai', None, 'OPENAI_API_KEY', 'gpt-4o', 1),
Provider('anthropic', 'https://api.anthropic.com/v1', 'ANTHROPIC_API_KEY', 'claude-3-5-sonnet', 2),
Provider('local', 'http://localhost:8000/v1', 'LOCAL_KEY', 'llama-3.1-8b-inst', 3),
]All lessons in this course
- Measuring LLM Latency: TTFT and TPOT
- Load Balancing and Multi-Key Strategies
- Fallback Providers and Circuit Breakers
- Timeout Budgets and Graceful Degradation