Resilience: Circuit Breakers and Retries
Keep systems healthy under partial failure.
Resilience Patterns
In a distributed system, partial failure is the normal state — some dependency is always slow, restarting, or overloaded. Resilience is about designing each service so one sick dependency doesn't take your service (and then the whole system) down with it.
This lesson covers the core toolkit: timeouts, retries with backoff, circuit breakers, bulkheads, and graceful degradation — all from PHP.
Timeouts First
The single most important resilience setting is the timeout. Without one, a slow downstream pins your PHP-FPM workers waiting; requests pile up; you run out of workers; your service goes down because someone else was slow. This is the textbook cascading failure.
Set both a connect timeout and a total request timeout on every outbound call. Always.
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
$http = new Client([
'connect_timeout' => 0.5, // fail fast if we can't even connect
'timeout' => 2.0, // hard cap on the whole call
]);
// A hung dependency now fails in 2s instead of holding a worker forever.All lessons in this course
- From Monolith to Microservices
- Service Communication: REST and gRPC
- API Gateways and Service Discovery
- Resilience: Circuit Breakers and Retries