Engineering / May 30, 2019

Optimize Your Base Image Creation Process

Whether you are running your servers in the cloud, in your own data center, or in a closet in the office, hopefully you have a repeatable process to build and deploy them in a consistent and reliable way. In this article I’ll go over how we build our base images at Eyeview and provide some tips and tricks that we’ve found along the way to do things as efficiently as possible.

First, let’s look at what an “image” is. You can think of it simply as taking a snapshot of a running configured operating system with installed software, which can then be used to quickly launch more machines. The “base image”, which is also sometimes referred to as a “golden image”, is typically used as a base to deploy the rest of the application stack on top of. It would usually contain all the necessary versions of software libraries, security patches, container runtime, monitoring and metrics agents, as well as any other utilities that may be useful to have on every single system in your environment. Since we use the AWS cloud, our base image is an AMI (Amazon Machine Image) and I’ll use the two terms interchangeably for the rest of this article.

Tip #1: Plan ahead. Choose what goes into the base image and what will be installed later on demand

Where do you draw the line of how much to install onto your base image? I guess that depends on many different factors. You may want to answer questions like: How often/fast can you rebuild your base image? How often do you need to install security patches? How fast do you want your deployments to happen? What software/packages need to be the exact same across all instances in the environment and which ones may have to be different?

One approach would be to bake everything into the AMI image up to the application version every time developers build their code. We have found, however, that for us it’s more efficient to build and manage a single base image with all the configurations and packages that need to be the same across the board. For everything else we use configuration management.

Tip #2: Build your base image in an automated way from a configuration file and/or provisioning script

Here at Eyeview, we are big fans of all things HashiCorp, (e.g. see our previous post on Consul) so it isn’t a surprise that we have been using another one of HashiCorp’s tools, Packer, to do our image build automation for a few years now. HashiCorp’s website describes Packer as: “an open source tool for creating identical machine images for multiple platforms from a single source configuration.” It is quite flexible in fact. It can build images for Amazon EC2, Azure, VMWare, VirtualBox, Docker, and many others.

The great thing about using Packer for our image builds is that we have all the base image’s requirements in one template file which calls a provisioner script to do the configuring and cleaning up as the AMI is created. Here it is in more detail:

The Packer Template File
The Packer template is a JSON file where you specify the builders, provisioners, variables, and other things you want Packer to use when building your images. Since we are running on Amazon EC2 and use EBS-backed root volumes for our instances, in our case we specify type: amazon-ebs. You can see basic examples of this on Packer’s site, but I want to mention the things that make our lives easier:

1. This little snippet automatically selects the latest official Ubuntu 18.04 AMI from Canonical. This way there’s no need for you to go to Ubuntu’s Amazon EC2 AMI Locator and figure out which AMI to start with.

2. You can also enable Enhanced Networking support and make use of network speeds up to 100 Gbps for your AMI’s right here by adding the following line:

“ena_support”: “true”

3. Up to this point we just have the latest Ubuntu image. Now let’s install all of our own secret sauce on it before we bake it into an AMI.

You can use any number of tools to provision the base image, but we’re still using an old trusty shell script. In our case for example, among other things, our installs all the latest security patches for the OS; sets up tools we use regularly, such as Ansible, Sensu and Diamond; configures Java and Docker version(s) as needed; and finally cleans up unnecessary software and artifacts left from the above mentioned installs.

4. Finally we build our image, and watch as Packer does its magic. Just issue the following command and you’ll be able to see the output of your provisioner in the terminal:

packer build hvm-ebs-18_04-latest.json

Now that our AMI is built, what’s next? Obviously we want to test it out, launch some applications and if all is well – promote it to production so all of our services can start using it. Which brings us to…

Tip #3: Build your deployment tools in a way that they automatically pick up the AMI used for launching new instances, based on pre-set tags

Choosing the AMI automatically during deployment will make it much easier for your team to roll out or roll back an AMI. In addition, you may also want to build a tool to promote the new AMI by automatically tagging it as needed, and demoting or untagging the old base image. This again makes it easier to roll back in case something goes very wrong. Here is our ansible playbook which does just that:

ansible-playbook promote_ami.yml -e new_ami=ami-0111b111101001c10

We use this ansible playbook to tag/untag our production AMI’s:

Tip #4: If you can help it – use one image. For everything!

We had a few different images for a while, supporting different virtualization types (paravirtual, hvm, ebs, instance-store) so we could get the most cost-effective spot instances for our fleets, but we found it was much easier to manage a single one. Additionally, streamlining on using primarily the latest/newest instance types (in favor of the cheaper but older m2’s for example) has slightly decreased our costs overall, while giving us better performance.

Finally, you may also find it helpful to build a little tool like this one to keep an eye on your AMI’s being used at any point in time:

Using the methods described above, earlier this year we rolled out a significant OS upgrade from Ubuntu 14.04 LTS to Ubuntu 18.04 LTS (moving from upstart to systemd) to over 35 services across over 1500 instances in just a few days. Then we did the same with Java a few weeks later and repeated it again with the latest version of Ansible shortly thereafter. All of these upgrades were critical and could have impacted our applications, but the migrations were seamless for both clients and engineers.

The process is efficient and we have shown it is safe to use a single global flag to control the production AMI for all apps and services. Our Executive Director of Engineering actually is “still a bit surprised that when we do that, things don’t break”. Of course we wouldn’t be able to do this as successfully if we did not already have a great CI/CD pipeline, and testing in production (as safely as possible… a great article on this topic here).

Looking to the future, we have plans to achieve completely hands-off automatic upgrades and automatic rollbacks. As we continue to roll out and grow our container fleets and move things from the AMI into Docker images, we’ll likely see this process morph and evolve but we don’t expect it to go away any time soon.

Itso Slavchev

Itso Slavchev Lead DevOps Engineer

Date: 05.30.2019