Announcing Heroku Autoscaling for Web Dynos

We’re excited to announce that Heroku Autoscaling is now generally available for apps using web dynos.

We’ve always made it seamless and simple to scale apps on Heroku - just move the slider. But we want to go further, and help you in the face of unexpected demand spikes or intermittent activity. Part of our core mission is delivering a first-class operational experience that provides proactive notifications, guidance, and—where appropriate—automated responses to particular application events. Today we take another big step forward in that mission with the introduction of Autoscaling.

Autoscaling makes it effortless to meet demand by horizontally scaling your web dynos based on what’s most important to your end users: responsiveness. To measure responsiveness, Heroku Autoscaling uses your app’s 95th percentile (p95) response time, an industry-standard metric for assessing user experience. The p95 response time is the number of milliseconds that only 5% of your app’s response times exceed. You can view your app’s p95 response time in the Application Metrics response time plot. Using p95 response time as the trigger for autoscaling ensures that the vast majority of your users experience good performance, without overreacting to performance outliers.

Autoscaling is easy to set up and use, and it recommends a p95 threshold based on your app’s past 24 hours of response times. Response-based autoscaling ensures that your web dyno formation is always sized for optimal efficiency, while capping your costs based on limits you set. Autoscaling is currently included at no additional cost for apps using Performance and Private web dynos.

autoscaling_demo

Get Started

From Heroku Dashboard navigate to the Resources tab to enable autoscaling for your web dynos: enable_as

From the web dyno formation dialog set the desired upper and lower limit for your dyno range. With Heroku Autoscaling you won’t be surprised by unexpected dyno fees. The cost estimator shows the maximum possible web dyno cost when for using autoscaling, expressed in either dyno units for Heroku Enterprise organizations, or dollars.

Next, enter the desired p95 response time in milliseconds. To make it easy to select a meaningful p95 setting, the median p95 latency for the past 24 hours is provided as guidance. By enabling Email Notifications we’ll let you know if the scaling demand reaches your maximum dyno setting, so you won’t miss a customer request.

config0

Monitoring Autoscaling

You can monitor your autoscaling configuration and scaling events the Events table on Application Metrics and view the corresponding impact on application health.

autoscalingblogevents

When to Use Autoscaling

Autoscaling is useful for when demand on web resources is variable. However, it is not meant to be a panacea for all application health issues that result in latency. For example, it is possible that lengthy response times may be due to a downstream resource, such as a slow database query. In this case scaling web dynos in the absence of sufficient database resources or query optimization could result in exacerbation of the problem.

In order to identify whether autoscaling is appropriate for your environment we recommend that you load test prior to implementing autoscaling in production, and use Threshold Alerting to monitor your p95 response times and error rates. If you plan to load test please refer to our Load Testing Guidelines for Support notification requirements. As with manual scaling, you may need to tune your downstream components in anticipation of higher request volumes. Additional guidance on optimization is available in the Scaling documentation.

How it Works

Heroku's autoscaling model employs Little's Law to determine the optimal number of web dynos needed to maintain your current request throughput while keeping web request latency within your specified p95 response time threshold. The deficit or excess of dynos is measured as Ldiff, which takes into consideration the past hour of traffic. For example in the following simulation, at time point 80 minutes there is a spike in response time (latency) and a corresponding dip in Ldiff, indicating that there is a deficit in the existing number of web dynos with respect to the current throughput and response time target. The platform will add an additional web dyno and reassess the Ldiff. This process will be repeated until the p95 response time is within your specified limit or you have reached your specified upper dyno limit. A similar approach is used for scaling in.

autoscalingsimga

Find Out More

Autoscaling has been one of your top requested features when it comes to operational experience - thank you to everyone who gave us feedback during the beta and in our recent ops survey. For more details on autoscaling refer to the Dyno Scaling documentation. Learn more about Heroku's other operational features here.

If there’s an autoscaling enhancement or metrics-driven feature you would like to see, you can reach us at metrics-feedback@heroku.com.

Video Transcript