Posted on

Using AWS Auto Scaling with an Elastic Load Balancer cluster on EC2

Back in June last year I wrote about creating a WordPress cluster on Amazon’s EC2. In this post I’ll run through a couple of the problems with that cluster, I’ve experienced, and how I solved them with Amazon’s Auto Scaling service.

The problems with the cluster

A couple of things were not ideal with the cluster that I’ve been putting up with for far too long but finally set aside some time this afternoon to fix.

1) The price of micro-sized spot instances spikes, to crazy prices, sometimes.
This has meant that although the price of micro instances while they’re running is cheap, when the price spikes they all die off and leave the cluster vulnerable. Unfortunately to set up auto-scaling in combination with spot-priced micro instances would require coding up a hybrid solution with shell scripts, and although I enjoy tinkering with this setup, I can’t justify that much effort when an out-of-the-box solution exists, Auto Scaling.

2) Even when using on-demand instances, they just die sometimes
I had set up a CloudWatch alarm to email when the number of healthy instances drops below my minimum level (2 instances). This normally means the cluster is getting a bit weak, and more often than not shortly after getting this email, I get my monitoring email to say the site is down. This was happening infrequently enough for me to tolerate, but frequently enough to be a hassle – I’d have to keep stopping the unhealthy instance and firing up a new one once every 7-10 days.

So… with those two pain points nagging me, I present the next iteration of the micro instance WordPress cluster, the self healing, self scaling cluster – all thanks to Amazons Auto Scaling and CloudWatch services.

Setting up Auto Scaling

It’s actually not too hard to set all this up here’s roughly 4 steps that should do it.

0) Make sure you have the right AWS tools installed and setup.

You’ll need the Auto Scaling tools and if you haven’t got them already, the latest EC2 tools – though not strictly needed, it’s worth setting them both up.

Chuck them in some sensible location like ~/bin/AWS/ and then if you’re bash user, a ~/.bashrc with these in it, will help:

export KEY_HOME=/Users/you/bin/AWS/your-aws-certs
export EC2_PRIVATE_KEY=$KEY_HOME/pk-ABCDEFG123746293642325354.pem
export EC2_CERT=$KEY_HOME/cert-ABCDEFG123746293642325354.pem
 
export EC2_HOME=/Users/you/bin/AWS/ec2-api-tools-1.5.2.4/
export JAVA_HOME=/Library/Java/Home
 
export AWS_AUTO_SCALING_HOME=/Users/you/bin/AWS/AutoScaling-1.0.49.0/
export PATH=$EC2_HOME/bin:$AWS_AUTO_SCALING_HOME/bin:$PATH

You should see your current instances by running ec2-describe-instances – if you do, then everything appears to be in order.

1) Create the Auto Scaling launch configuration and group

as-create-launch-config $YOUR_CONFIG_NAME --image-id ami-123456 --instance-type t1.micro --group $YOUR_SECURITY_GROUP -monitoring-disabled

Note: I use -monitoring-disabled because I want the basic monitoring, not the premium detailed monitoring. If you’re a real cloud high-roller, splash out on detailed monitoring by using the flag -monitoring-enabled

as-create-auto-scaling-group $YOUR_GROUP_NAME --availability-zones us-east-1a --launch-configuration $YOUR_CONFIG_NAME --desired-capacity 2 --min-size 2 --max-size 4 --load-balancers $ELB_NAME --health-check-type ELB --grace-period $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY

Note: $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY can be whatever suits for your servers and application, I use 5 minutes and it seems fine so far.

In my example group above I have a max of 4, a min of 2 and a desired capacity of 2. Auto Scaling will ensure your instance count remains within those parameters. Desired capacity means, the number of instances you normally have running.

Your numbers will be different depending on your traffic requirements. I suggest you set a maximum above your desired capacity, so that you can add an alarm (see step 3 below) that increases your instance count when you get featured on reddit or slashdot…

2) Create the policy for scaling up

as-put-scaling-policy $YOUR_POLICY_NAME --auto-scaling-group $YOUR_GROUP_NAME --adjustment=1 --type ChangeInCapacity

This policy will add 1 extra instance, but there are other policy types, that can do things like ensure a specific number of instances. The –help options for the command line tools will guide you on this well, they’re very useful and I like them.

3) Create the actual alarms that invoke the policy in 2)

This is easiest in the actual CloudWatch web-based console. It will step you through a wizard. I suggest using both the policy, but also to add a notification, so that you get an email when the autoscaling happens – that way you can know if it’s running amok.

Here’s my setup with 2 alarms, one for self healing, and one for ramping up for big traffic:

Alarm: Unhealthy
Threshold: HealthyHostCount < 2 for 5 minutes Actions:
in ALARM state –
Use policy “AddInstances (Add 1 instance)” for group “your-group”
Send message to topic “addingInstance” (you@gmail.com)

Alarm: HighTraffic
Threshold: RequestCount >= 100 for 5 minutes
Actions:
in ALARM state –
Use policy “AddInstances (Add 1 instance)” for group “your-group”
Send message to topic “HighTraffic” (you@gmail.com)

4) Test your handy work

Choose your favorite way of breaking a server, this will probably be sufficient on most Debian systems:

sudo /etc/init.d/apache2 stop

Within 5-10 minutes the auto scaling should have kicked in, started up a new instance and killed your failing one.

I hope it goes without saying, don’t do this on your mission critical production server…

A couple of cool side effects of doing this the AWS way, rather than rolling your own shell scripts to start/stop instances and add them to the ELB.

  1. Auto Scaling will terminate unhealthy instances and remove them from the load balancer cluster automatically.
  2. Auto scaling will add newly created instances to the cluster automatically.

I’ll update this article if I find any problems with the setup I have described here. If you try this on your own cluster, please let me know your results.

Update 31 Jan 2012: So one obvious problem is that it increases instances, and never decreases them, doh!

You’ll want something like this give it a name like removeInstance:

as-put-scaling-policy $YOUR_POLICY_NAME --auto-scaling-group $YOUR_GROUP_NAME --adjustment=-1 --type ChangeInCapacity

Then set an alarm for ‘normal’ traffic and have it reduce the instances.

Another issue is I found, the minimum number of instances in the configuration group, should also be your desired capacity (probably, your setup may vary). To do that run a as-update-auto-scaling-group like so:

as-update-auto-scaling-group $YOUR_GROUP_NAME --availability-zones us-east-1a --launch-configuration $YOUR_CONFIG_NAME --desired-capacity 2 --min-size 2 --max-size 4 --load-balancers $ELB_NAME --health-check-type ELB --grace-period $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY

I have also updated the above original creation of the group command to reflect the minimum = desired capcity change.

17 thoughts on “Using AWS Auto Scaling with an Elastic Load Balancer cluster on EC2

  1. Could this work for magento? I guess you would need to split the db and the web servers on to different instances? And does autoscaling require any server configuration? How does AWS know how to configure the new instance? E.g. With linux users, ftp passwords etc… This sounds a whole lot easier than paying for rightscale and building servertemplates… What about the ELB, is that set up automatically? How would it distribute traffic, round-robin? Sorry for all the questions!!!

  2. Yes totally can work with magento. It uses an existing machine image (AMI) to b the template for each server that stops and starts.

  3. Hi Ashley,

    I’ve been following your blog for awhile, always good stuff! So we are currently trying to load balance our Magento site but have run into problems with a good Admin solution that will allow us to propagate images to all nodes. I don’t know if you’ve actually tried this yet, but if you set up a load balancer and give it a domain name, ie load.com, and access the Admin panel through load.com/admin, then you never know which node you might randomly be using for admin access. That said, setting up a new item through a random node and uploading an image means only that node gets an image. We have other issues with server side files not propagating, but that’s a custom extension–similar problem, nonetheless. Any ideas how to put the images in one place? S3 bucket? I’ve seen One Pica, but that doesn’t solve problems with shared writeable files as well as images… and I’m not even sure it works that well for images in a load balanced or auto scaled environment. On the other hand, and this relates to another post of yours, we are looking at using MageMojo instead of AWS to hopefully reduce the monthly hosting bill and avoid the need to load balance at all. Thanks in advance for your insight!

  4. Hi, I’d consider mounting an NFS point and symlinking var/media (and related folders) so that all the servers share the same images (and even cached versions of those images).

    Once an image is loaded the first time by your CDN (if you use one) then it won’t hit the cluster again for the image.

  5. Hi Ashley,

    on which instance do I have to start the shell commands? The master instance?
    Or do the shell commands work from every linux shell. I’ve never used them before.

    Regards
    Heiko

  6. hello Ashley

    I need Your help please, I have client who using AWS. and now I have problem with Loadbalancer, I launch instance from cloudformation template, which is use autoscale performance. And my problem is, why loadbalancer can’t check my EC2 instance when it’s autoscale. I just launch EC2 instance in eu-west-1c and after autoscale my EC2 move to eu-west-1b region why like that? so the Loadbalance can’t check it.
    please give me solution.

    thanks before ashley

  7. A AWS CloudFormation template for all this, would have been killer ;P

  8. Why not override Image.php and disable image file check at the website level? So using Pica, you can upload images via Admin to S3/Cloudfront.

  9. Hi,
    How much it will cost per month for this setup?

    Thanks

  10. So hard to say for sure – too many variables.

    Depends on traffic, how many instances you set as your min and max, how you pay for them (reserved vs on demand) etc.

    You’re going to have to get your spreadsheet on.

  11. thanks for the guidance, super helpfu!

  12. Great Post! Thank you for sharing it

    What other alternative can you recommend instead of NFS (to be used in the WordPress scenario with multiple instances) ?

    Thanks!

  13. Thanks for the post.
    When using AutoScaling, do we need to explicitly configure any ELB (or does it get setup automatically)?

    Regards,
    Krishna

  14. Hi, you need to pass the existing ELB name into the stup command: $ELB_NAME in my examples.

  15. Thanks.
    How do we handle the syncing of databases’s and files between instances for a typical wordpress like blog?

    Thanks,
    Krishna

  16. Thanks, yes – read that post.
    Couple of questions :
    a) For the DB sharing by pointing to the master node, I guess, if the master dies then all nodes will also wont be able to access the DB. Isn’t that true OR am i missing something?

    b) Can you please point to bit more detailed instructions on setting up NFS?

Comments are closed.