Fun fact: the Heroku API consumes more endpoints than it serves. Our availability is heavily dependent on the availability of the services we interact with, which is the textbook definition of when to apply the circuit breaker pattern.
And so we did:
Circuit breakers really helped us keep the service stable despite third-party interruptions, as this graph of p95 HTTP queue latency shows.
Here I'll cover the benefits, challenges and lessons learned by introducing this pattern to a large scale production app.
A brief reminder that everything fails
Our API composes over 20 services – some public (S3, Twilio), some internal (run a process, map DNS record to an app) and some provided...