Random thoughts of a warped mind…

January 16, 2012

Anatomy of an Amazon Beanstalk host

Filed under: Amazon EC2,Linux,Ruby,Virtualization — Srinivas @ 22:02

I just started playing around with Amazons Elastic Beanstalk service today. AEBS allows you to take a standard WAR file (web archive) that you would use with Tomcat and use it to build a scalable cluster consisting of 1-N Amazon instances (each of which could be anything from the smallest t1.micro to the largest instance type) that are put behind an Elastic Load Balancer and auto-scaled. Pretty sweet huh? No screwing around with Tomcat and all that (unless we really want to!). See Amazons site here for more info on what Elastic Beanstalk is. Following some of the links there will also walk you through how to setup a beanstalk environment (for starters)…

Just a bit of digging around got me curious enough to try to figure out the anatomy of the Beanstalk nodes… So here are my findings.

First things first, here is the configuration I went with to create my Beanstalk setup

(A valid Amazon AWS login is needed!).
  1. Selected type “Amazon 64 bit Linux with Tomcat 6″
  2. Instance type – t1.micro
  3. Number of instances – Auto-scale with 1-4 nodes based on the load
  4. Name of my application – myapp.war (If you dont have one, feel free to grab tomcats standard examples.war or similar)
  5. SSH key – I specified that this Beanstalk cluster pre-copied a public key of mine into each node that was instantiated – that way I could get under the hood :-) Got to have root access! :-)
  6. Specified a custom environment name and set my url to be the same (For the purposes of this discussion lets say onepwr1.elasticbeanstalk.com is the path under which I wanted my tomcat app to be internet visible).

What EBStalk does is the following

  1. Copies the war file you uploaded to an S3 bucket elasticbeanstalk-us-east1-YOURNUMERICACCOUNTID (Where YOURNUMERICACCOUNTID is unique to your AWS account)
  2. Starts up N t1.micro instances (I specified t1.micro as the preferred instance type, YMMV based on what you used) and copies in the war file into Tomcat webapps on these instances(from the S3 bucket)
  3. Clubs all these instances under a single path onepwr1.elasticbeanstalk.com (This path refers to an Elastic Load Balancer that “fronts” for all these nodes). The load balancer keeps track of the health of individual nodes and also brings up or takes down existing nodes depending on the traffic/load etc.
  4. Note that you cannot rely on a single instance being up persistently – even though they may be brought up with EBS (as in elastic block storage), their runtime and termination is controller by the autoscale/ELB config. So dont take any individual instance in the Beanstalk setup for granted!
  5. A new SNS notification topic is created (and you can optionally subscribe your email to it to receive notifications)

When I created the Beanstalk environment, I specified that it use one of my preexisting SSH Key pairs. So I could go into the AWS Console EC2 tab and lookup the public hostname of one of the newly started t1.micro instances and then SSH into it (these run Amazon Linux, so ssh as ec2-user@publichostname).

Primary components on each node instantiated

  1. Apache running on Port 80
  2. Tomcat on port 8080
  3. Ruby based app HostManager on Port 8999 (For those familiar with Ruby, this app is built with the Sinatra framework and is run using Rubys “Thin” web server which in turn is based on the venerable Event Machine framework)

Apache is configured to reverse proxy all access to “/_hostmanager “to the Ruby app on Port 8999 and everything else to Tomcat on port 8080. Nothing fancy here. Tomcat config is pretty standard as well.

The entire Beanstalk setup relies on the Ruby based HostManager setup to monitor the status of the services and report it back to the health checks that are run by the autoscaling setup. This means that all three components above are required to be running all the time. This is ensured by starting them up via Bluepill which is a monitoring tool written in Ruby(More info on that at http://rubygems.org/gems/bluepill).

Bluepill is used to start up Apache, Tomcat and HostManager via the system service “hostmanager”:

[root@domU-12-31-38-06-BE-76 ~]# chkconfig --list hostmanager hostmanager
0:off    1:off    2:on    3:on    4:on    5:on    6:off

This fires up Bluepill from config in

/opt/elasticbeanstalk/srv/hostmanager/config/hostmanager.pill 

The Tomcat related activity when this is started is as follows:

Wading through the HostManager code, we finally get to

/opt/elasticbeanstalk/srv/hostmanager/lib/elasticbeanstalk/hostmanager/init-tomcat.rb

which does:

# Deploy app, will only deploy if it hasn't been yet
ElasticBeanstalk::HostManager::DeploymentManager.deploy(
     ElasticBeanstalk::HostManager::Applications::TomcatApplication.new(ElasticBeanstalk::HostManager.config.application_version)
)

The meat of the process is carried out by the Ruby class ElasticBeanstalk::HostManager::Applications::TomcatApplication which is sourced from library

/opt/elasticbeanstalk/srv/hostmanager/lib/elasticbeanstalk/hostmanager/applications/tomcatapplication.rb

Actions carried out by the TomcatApplication class

The constructor sets the following properties that also accessible through read-only accessors.


# Directories, PID files, etc
 @tomcat_pid_file            = '/var/run/tomcat6.pid'
 @tomcat_webapps_dir         = '/var/lib/tomcat6/webapps'
 @tomcat_deploy_dir          = '/tmp/elasticbeanstalk-tomcat-deployment'
 @tomcat_pre_deploy_script   = '/tmp/tomcat_pre_deploy_app.sh'
 @tomcat_deploy_script       = '/tmp/tomcat_deploy_app.sh'
 @tomcat_post_deploy_script  = '/tmp/tomcat_post_deploy_app.sh'

This runs through the following actions. Each “action” is defined in the Ruby class as inline shell code that is dumped to an external file (in /tmp/) and execute via the system() call(So even the big boys are not immune to insipid system() calls :-) ).

PRE_DEPLOY_SCRIPT (/tmp/tomcat_pre_deploy_app.sh)

The PRE_DEPLOY_SCRIPT fetches Application.war from its S3 storage (When you uploaded the WAR to Beanstalk, it was saved into a S3 bucket). Irrespective of what your war file was called, it would be sourced into the EC2 instance as Application.war.

If you login to your AWS console and go into the S3 section, you would see a S3 pool with name similar to “elasticbeanstalk-us-east1-YOURNUMERICACCOUNTID” (Where YOURNUMERICACCOUNTID is unique to your AWS account). In this you would see the war file you had uploaded (say it was called myapp.war) saved as TIMESTAMP-OT-myapp.war. The timestamp is used to distinguish between multiple version of the app (that you may have uploaded) – this allows you to rollback to previous versions if needed.

This uses the following work folder:

[root@domU-12-31-38-06-BE-76 hostmanager]# ls -l /tmp/elasticbeanstalk-tomcat-deployment/
total 64
-rw-rw-rw- 1 elasticbeanstalk elasticbeanstalk    95 Jan 17 01:38 application_digest
-rw-rw-rw- 1 elasticbeanstalk elasticbeanstalk    95 Jan 17 01:38 expected_digest
-rw-rw-rw- 1 elasticbeanstalk elasticbeanstalk 51883 Jan 17 01:38 wget.log
[root@domU-12-31-38-06-BE-76 hostmanager]#

Wget.log is just a log og the fetch from S3 to local storage.


--2012-01-17 01:38:19--  https://elasticbeanstalk-us-east-1.s3.amazonaws.com/environments%2FSANITIZED39%2Fapplication.war?Expires=1358300073&versionId=SANITIZED&AWSAccessKeyId=SANITIZED&Signature=SANITIZED%3D
Resolving elasticbeanstalk-us-east-1.s3.amazonaws.com... 207.171.163.14
Connecting to elasticbeanstalk-us-east-1.s3.amazonaws.com|207.171.163.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34366529 (33M) [application/octet-stream]

...

...

2012-01-17 01:38:22 (14.6 MB/s) - “/tmp/elasticbeanstalk-tomcat-deployment/application.war” saved [34366529/34366529]

So now we have the application.war file “copied” from S3 to the amazon linux instance into /tmp/elasticbeanstalk-tomcat-deployment/application.war.

END_DEPLOY_SCRIPT(/tmp/tomcat_deploy_app.sh)

This does the following:

  1. Stops running Tomcat process
  2. Clears out the tomcat work folders i.e. /usr/share/tomcat6/work/Catalina/* /var/lib/tomcat6/webapps
  3. Unzips the previously downloaded /tmp/elasticbeanstalk-tomcat-deployment/application.war into /var/lib/tomcat6/webapps/ROOT and changes permissions on this to tomcat6:elasticbeanstalk
  4. Moves the Application.war file from /tmp/…/ to /var/lib/tomcat6/webapps/ROOT.war
  5. Starts up tomcat

Note: Both tomcat shutdown and startup are done via Bluepill to ensure that it can monitor tomcat and restart it if necessary. (Similar to how you could start a cluster manager service via cluster utilities and not as a a standalone program).

POST_DEPLOY_SCRIPT

This is a placeholder on my setup. Not sure if instances using Tomcat 7 have something more in here.

Limitations of Elastic Beanstalk

I love how you can trivially setup a load balanced Highly available setup for a Tomcat app (takes < 2 minutes!) and manage it all from a single control. But my complaint with Elastic Beanstalk is that it allows you to setup only ONE war file per Beanstalk cluster/environment. Thats *sucks*. I dont want to spawn a cluster for every single war file (and I have many!). This got me thinking how to get the same environment to server out multiple war files (and without me having to “manually” push any each time).

Each instance spun up is provisioned by the auto-scaler – That means you cannot point it to a custom AMI of your own(or can you? Maybe I have’nt found out how?). These instances could be dynamically dropped and recreated (Beanstalk only guarantees that it will have 1-N instances running, it does’nt say the same instances will always be persisted). This means any local changes you make per instance will be gone when that instance is terminated. So how do you “centrally provision” any of the new instances to run the additional war files you have in addition to the “preconfigured” myapp.war that you uploaded as part of the Beanstalk setup? Here are a few suggestions( I am assuming you have some sort of central monitoring host at EC2 which has access to your ACCESS KEY ID and SECRET).

Multiple war files on a single Elastic Beanstalk environment

I am yet to implement this but this is how I envision doing this and getting around the single WAR file limit.

  1. When you create a new Elastic Beanstalk environment, a SNS notification topic is automatically created. This would be called ElasticBeanstalkNotifications-Environment-<envname>
  2. After your environment is UP, Have a simple ruby based piece of code “subscribe” to this topic. (Checkout Rubys Fog module)
  3. Dump the additional war files you want to use into a seperate S3 folder e.g. mywars/myapp1.war, myapp2.war etc.
  4. If you see a notification for a new instance being added to the beanstalk environment – try to get its AWS Instance ID. Each instance in the beanstalk setup is yet another AWS instance after all. Once you have the instance-id its trivial to get its hostname via AWS ec2_describe_instances script or via Fog (Use Fog::Compute.new() to instantiate aws object and then aws.describe_instances({‘instance-id’=>YOURINSTANCEID}).
  5. Copy each of the war files from S3 to the new nodes /var/lib/tomcat6/webapps/ i.e. as /var/lib/tomcat6/webapps/myapp1.war and so on… (When you setup the Beanstslk cluster, make sure you provision it with one of your SSH key pairs, so you can login to nodes using that private key)
  6. Bounce Tomcat on that node via
    /usr/bin/sudo /opt/elasticbeanstalk/bin/bluepill <stop|start> tomcat6

Note:

  1. You can use Fog to setup a SSH connection to your EC2 hosts as well. It really is a one-stop module for almost all things AWS.
  2. On second thoughts, I doubt you can subscribe to a SNS topic (like you would to say an AMQP queue), but you get the idea – AWS will post a notification to SNS and you could use that to fetch further info on new nodes etc…

I’ll try to post more on how to automate deploying additional war files to newly instantiated Beanstalk instances when i find the time… Also more info on how Amazon health checks poll the HostManager Sinatra app for checking status of individual nodes – so many things to do and so little time… :-)

  • Pingback: servinion

  • Danilo

    Thank you for the valuable information.
    I’m trying to use Beanstalk AMI without Beanstalk.
    I guess I need to learn how to command _hostmanager manually. Do you think that’s feasible?

  • Srinivas

    Yes, Hostmanager is a ruby sinatra app that Beanstalk polls to get the status of the instance. On the Amazon Linux instances, this is located in /opt/elasticbeanstalk/ and the ruby version this uses is 1.9.x (This is separate from the OS wide instance i.e. /usr/bin/ruby is OS supplied ruby and /opt/elasticbeanstalk/bin/ruby is the ruby binary used by host manager). You can lookup http://www.sinatrarb.com to get more info on Sinatra – and then look at the source to host manager and add additional routes etc as needed based on your needs.

  • Pingback: trelocet

Powered by WordPress