AWS Auto Scaling group in 5 minutes

Increased traffic can put a bigger load on the CPU of the EC2 instances of the application stack. Luckily, AWS provides a way to scale out these instances, and automatically creates new ones to reset the metrics to normal. In this post, I'll write about EC2 Auto Scaling.

Suppose we have a web application hosted on AWS. The app works well, everything is good, and we are about to launch a new feature which we expect to attract a lot of interests, and this will result in an increased traffic.

If we are not prepared for such events in advance, it can happen that our existing EC2 fleet won’t be able to handle the increased load. Users might experience higher latency, which we definitely want to avoid, because we care for our users.

Auto Scaling groups can help manage the increased workload our EC2 instances are exposed to.

1. What is Auto Scaling?

Auto Scaling is an AWS-provided service, which monitors the resources, and automatically adjusts them based on performance metrics.

AWS provides auto scaling for various services, from load balancing to database instances.

This post will specifically deal with Auto Scaling groups, which are responsible for scaling in and out EC2 (server) instances.

2. Scaling EC2s

Auto Scaling group is a collection of EC2 instances that ensures the optimal performance of the application and maintains the desired number of instances. How this is achieved depends on the configurations of the Auto Scaling group.

For example, if metrics show that the current instances are running above a pre-set CPU threshold, Auto Scaling group will launch a new instance to handle the increased workload. When the load decreases again, the newly created instance will be removed.

Auto Scaling group has three components.

2.1. Launch configuration or launch template

This is the blueprint for the new instances to be launched when needed. It specifies what parameters (size, family type) the new instances should have. When Auto Scaling group receives a signal to scale out, it will provision the new instance(s) with the parameters specified in the configuration.

2.2. Groups

Groups are logical units of EC2 instances with minimum, desired and maximum number of instances.

Auto scaling EC2 instances
Auto Scaling group with minimum, desired and maximum number of instances (source: aws.amazon.com)

This image shows that two instances run in normal circumstances (desired capacity). When the load on the instances increases, Auto Scaling group adds a maximum of two instances to the stack (maximum size) to cope with the load. It’s also possible to remove one instance when the desired number of instances proved to be too much, and this will result in the minimum size.

The minimum size guarantees that at least one instance will always run, while the maximum size won’t be exceeded (only at a temporary basis, managed by AWS).

2.3. Scaling options

This component contains the settings which influence that scaling behaviour.

There are several ways to configure these settings.

The simplest way is to manually change the minimum, desired or maximum size, and Auto Scaling group will immediately react to this by adding or removing the necessary number of instances.

Scaling can also be based on schedule if the changes in the traffic are predictable. If our application has a predictable traffic pattern (e.g. higher traffic ihas been experienced on weekends from 8pm to 11pm), we can set a schedule, and Auto Scaling group will provision extra instances every week at that time period.

Auto Scaling group can watch various metrics, and can respond to CloudWatch alarms. For example, the account owner doesn’t want the CPU usage to go above 60% on either instance. So they can set up an alarm on this metric, and when this alarm is triggered, Auto Scaling group will launch a new instance. This scaling strategy is dynamic (it’s also called on-demand), and responds well to the changes in the traffic.

3. Rebalancing

It’s possible to enable Auto Scaling group in multiple availability zones (multi AZ).

In this case, Auto Scaling group tries to distribute the EC2 instances evenly between the AZs. For example, if three instances are running and the region has three AZs, each AZ will host one running instance.

But what happens if circumstances change, and other instances need to be added? If the configuration allows, say, two more instances, then these two EC2s will be added in two different AZs.

If there’s no capacity for that instance in the AZ, or an instance is manually deleted, Auto Scaling group will rebalance the instances and will try to evenly redistribute them. In these occasions, it’s possible that the number of instances go above the maximum size for a few minutes, until the rebalancing mechanism finishes its job. No intervention is needed; everything will automatically be managed by AWS.

4. Health checks

Auto Scaling group regularly performs health checks on the attached EC2 instances.

If the status of the EC2 instance is other than running, the Auto Scaling group will mark that instance as unhealthy, and will terminate the instance.

The other case, when the Auto Scaling group can terminate instances is when it’s connected to a load balancer, and the load balancer reports the instance as out of service. If that happens, Auto Scaling group will take steps to terminate the instance, and will launch a new one instead.

Contrary to the rebalancing feature, Auto Scaling group will first remove the unhealthy instance from the group this time, and only then will provision the new instance.

5. Auto Scaling group and load balancer

Auto Scaling group can increase the number of instances in the EC2 fleet if it’s configured to do so and the application should be able to route traffic to the new instance(s).

As it was pointed out before, Auto Scaling group integrates with load balancers so that the traffic is distributed evenly between the instances.

After the load balancer is attached to the Auto Scaling group, the instances automatically get registered with the load balancer. When this happens, the elastic load balancer starts checking the health status of the EC2 instances.

The Auto Scaling group will only mark an instance healthy if it passes both the EC2 and the load balancer health checks.

6. Example - What if I only want to have un one instance but that instance should always run?

This is a very common configuration for small applications. The app runs on one instance, but this instance should be up and running, no matter what.

This can be achieved by setting both the minimum, desired and maximum sizes to 1, and Auto Scaling group will maintain the desired number of EC2s (1 in this case).

If we attempt to manually terminate the instance, Auto Scaling group will launch a new instance, because there must be at least one instance running according to the configuration.

7. Conclusion

Auto Scaling group is a fleet of EC2 instances in the VPC, where the minimum, desired and maximum size of group is configured.

Depending on the configuration, Auto Scaling group can maintain the desired size or change it dynamically, based on metrics data.

Auto Scaling group also cooperates with load balancing, because the traffic should be evenly distributed across the instances, even if the number of EC2s change.

Thanks for reading and see you next time.