The Heroku Connect team ran into problems with existing task-scheduling libraries. Because of that, we wrote RedBeat, a Celery scheduler that stores scheduled tasks and runtime metadata in Redis. We’ve also open-sourced it so others can use it. Here is the story of why and how we created RedBeat.
Why We Created the RedBeat Celery Scheduler
Heroku Connect makes heavy use of Celery to synchronize data between Salesforce and Heroku Postgres. Celery is an asynchronous task queue that lets us schedule and queue jobs for execution by a background worker process. Over time, our usage has grown, and we came to rely more and more heavily on the Beat scheduler to trigger frequent periodic tasks. Beat is a Celery scheduler that looks at a list of tasks at predetermined intervals and adds them to the queue when the time is right.
For a while, everything was running smoothly, but as we grew, cracks started to appear. Beat is the default Celery scheduler, and it had begun to behave erratically, with intermittent pauses (yellow in the chart below) and occasional hangs (red in the chart below). Hangs would require manual intervention, which led to increased pager calls to our on-duty team.
Out of the box, Beat uses a file-based persistent scheduler, which can be problematic in a cloud environment. Cloud servers are ephemeral and are often restarted or recreated. You can’t guarantee Beat will restart with access to the same filesystem. Of course, there are ways to solve this, but it requires introducing more moving parts to manage a distributed filesystem. An immediate solution is to use your existing SQL database to store the schedule and django-celery, which we were using, allows you to do this easily.
After digging into the code, we discovered the hangs were due to blocked transactions in the database and the long pauses were caused by periodic saving and reloading of the schedule. We could mitigate this issue by increasing the time between saves, but this also increases the likelihood that we'd lose data. In the end, it was evident that django-celery was a poor fit for this pattern of frequent schedule updates.
We were already using the Redis key-value store as our Celery broker, so we decided to investigate moving the schedule into Redis as well. There is an existing celerybeatredis package, but it suffers from the same design issues as django-celery, requiring a pause and full reload to pick up changes.
So we decided to create a new package, RedBeat, which takes advantage of the inherent strengths of Redis. We’ve been running it in production for over a year and have not seen any recurrences of the problems we had with the django-celery scheduler.
The RedBeat Difference
How is RedBeat different from other Celery schedulers? The biggest change is that the active schedule is stored in Redis rather than within the process space of the Celery Beat daemon.
Creating or modifying a task no longer requires Beat to pause and reload. We just update a key in Redis, and Beat will pick up the change on the next tick. A nice side-effect of this is it’s trivial to update the schedule from other languages. Unlike with django-celery-beat, we no longer need to worry about sharing a file across multiple machines to preserve metadata about when tasks were last run. Startup and shutdown times improved since we don't suffer from load spikes caused by having to save and reload the entire schedule from the database. Rather, we have a steady, predictable load on Redis.
Finally, we added a simple lock that prevents multiple RedBeat daemons from running concurrently. This can sometimes be a problem for Heroku customers when they scale up from a single worker or during development.
After converting to RedBeat, we’ve had no scheduler-related incidents.
Needless to say, so far we’ve been happy with RedBeat and hope others will find it useful too.
Why not take it for a spin and let us know what you think?