Circuit Breaker
Definition
The Circuit Breaker design pattern prevents cascading failures in distributed systems by stopping calls to failing services, enhancing system resilience.
Use Cases
- Netflix: Preventing cascading failures in a large microservices architecture when downstream dependencies become slow or unavailable. — Netflix popularized circuit-breaker behavior in its Java stack via the Hystrix library (now in maintenance mode). Services wrapped remote calls with timeouts, failure thresholds, and fallback logic so that repeated failures would open the circuit and stop further calls for a cooldown period. (Improved resilience during partial outages by limiting resource exhaustion (threads/connection pools) and reducing knock-on failures across dependent services.)
- Microsoft: Improving reliability of distributed applications and cloud services by handling transient faults and preventing repeated calls to unhealthy dependencies. — Microsoft documents and promotes the Circuit Breaker pattern in the Azure Architecture Center. Teams commonly implement it in .NET using libraries such as Polly, configuring thresholds, open/half-open states, and fallback responses for downstream HTTP/database calls. (More stable services under dependency degradation, with faster recovery and fewer cascading outages when a downstream component is impaired.)
- SoundCloud: Protecting microservices from failures in downstream services and avoiding overload during incidents. — SoundCloud has described using resilience patterns in microservices (including circuit-breaker-like behavior) to isolate failures and reduce repeated calls to unhealthy services, typically combined with timeouts, retries, and bulkheads. (Better fault isolation and reduced blast radius during service incidents, improving overall availability.)
Frequently Asked Questions
- What's the difference between Circuit Breaker and Retry?
- Retry keeps trying a failed request (often with backoff) because the failure might be temporary. A Circuit Breaker stops sending requests after failures cross a threshold, giving the failing service time to recover and protecting your system from wasting resources. In practice, you often use both: limited retries for transient errors, plus a circuit breaker to prevent endless pressure on an unhealthy dependency.
- When should I use a Circuit Breaker in microservices?
- Use it when your service calls another service (or database/third-party API) and failures or slow responses could tie up threads, connection pools, or CPU and cause a wider outage. It’s especially useful for synchronous HTTP/gRPC calls, high-traffic paths (checkout, login), and dependencies with variable reliability. If a dependency is optional, pair the circuit breaker with a fallback (cached data, default response, queued work) to keep your service responsive.
- How much does a Circuit Breaker cost?
- The pattern itself is usually free if implemented in application code using open-source libraries (e.g., Resilience4j for Java, Polly for .NET). Costs come from operational overhead: extra monitoring/metrics, potential service-mesh or API gateway licensing/usage, and engineering time to tune thresholds and fallbacks. If implemented via a managed gateway/mesh, pricing depends on request volume, compute, and any paid platform features.
Category: software
Difficulty: advanced
Related Terms
See Also