Scripting S3 upload headers using Ant and s3cmd

18 Jun 2012 Sunbeam pedal car

Encouraged by discussion of the SEO benefits of site performance, I recently overhauled the blisshq.com upload process to ensure browser caching headers are correctly set. This involved changing my existing upload scripts to specify the necessary headers at upload time. The result is that new and updated HTML files now always have their caching information set automatically. Very convenient!

Since the blisshq.com domain has been live it has always been a set of static HTML files, as opposed to database driven. I use a static site generator called Jekyll to combine the pure content that makes up blog posts and other pages into layouts, meaning there's no duplication of code and any change to a layout doesn't have to be replicated hundreds of times.

The site itself is hosted in Amazon S3. The main advantage of S3 is its simplicity; it really seems like a remote filesystem you can upload and download files to and from. When HTTP access was made available to viewers of S3 files, it essentially meant you could host a static HTML website from Amazon S3. This made S3 a simple, cost-effective and scalable solution for Web hosting.

It does mean, however, you need to learn the S3 'way' for certain things. One such thing are HTTP headers. HTTP headers are pieces of information either uploaded when you, as a web page visitor, visit a new page (e.g. your browser version or your operating system which can both help in deciding what content to send to you) or downloaded when you receive the web page content (e.g. the length of the content and, pertinently, how long to cache the web page for).

One of my ongoing projects in maintaining blisshq.com is ensuring that the speed of the site remains snappy and responsive. It's a good way of ensuring bounce rates stay low and conversion rates high. There are a number of ways of testing site speed, and a number of advisory tools which suggest ways of making sites faster. Recently, I ran Google PageSpeed against blisshq.com and one of the main suggestions was to ensure that cacheing headers were set appropriately.

The advice to leverage browser caching is sensible; why ask the user to download data they already have? Implementing it means configuring your Web server (in this case, my Amazon S3 account) to send the correct headers for each file. In the case of S3, you use Metadata. This can be configured in the AWS Management Console. By choosing the Metadata tab, arbitrary headers can be added. In this case, we want to set the Expires header:

Alert! Alert! Manual work detected! As a Micro-ISV owner, the last thing you want to be doing is navigating the AWS console making these changes for every friggin' file in your website.

The good news is that it's possible to set these headers at upload time. In my own setup, once the website has been generated by Jekyll it is then uploaded using s3cmd. The generation and upload are automated by an Ant script which calls first the generation, then the upload of files. So, to set these headers automatically, I had to configure my call to s3cmd to specify the Expires header, and have Ant populate the value correctly.

I separated the call to s3cmd into its own Ant macro:

<!--
	Uploads a specified fileset to S3
-->
<macrodef name="upload">
	<element name="args" optional="true"/>
	<element name="filesetToUpload"/>
	<sequential>
		<apply executable="s3cmd" relative="true" dir="${site}">
			<arg value="--acl-public" />
			<arg value="put" />
			<srcfile />
			<targetfile />
			<args/>
			<filesetToUpload/>
			<globmapper from="*" to="s3://www.blisshq.com/*" />
		</apply>
	</sequential>
</macrodef>

This macro runs the s3cmd command with a put request, meaning: put the specified files into Amazon S3. Two parameters are specified: an optional set of arguments which are passed onto the s3cmd process, and the files to upload. In this case it's the former that's of interest. Readers interested in the rest of this script should consult the Ant Manual.

s3cmd allows the specification of metadata at upload time using the --add-header command line switch. For example:

s3cmd --acl-public put index.html s3://www.blisshq.com/index.html --add-header="Expires:Sun, 15 Jul 2012 15:18:50 BST"

Using this knowledge we can change our Ant script to specify the Expires header. First, we generate a new date string that can be used. This must follow the standard formatting rules for HTTP dates:

<tstamp>
      <format property="httpdate-onemonth" pattern="EEE, dd MMM yyyy HH:mm:ss zzz" offset="1" unit="month"/>
</tstamp>

This creates a new timestamp, in the correct HTTP date format, set one month in the future. Once we have that, we can create an Ant property with this timestamp as a value which we can refer to when uploading the files:

<property name="header.expires" value="Expires:${httpdate-onemonth}"/>

When calling the upload macro we then refer to this property:

<upload>
	<args>
		<arg value="--add-header=${header.expires}" />
	</args>
	<filesetToUpload>
		<fileset dir="/tmp" includes="**/*.html"/>
	</filesetToUpload>
</upload>

This uploads all files with suffix .html in directory /tmp and subdirectories, and does so with the Expires header. Subsequent downloads of these files in a browser have the correct HTTP Expires header set!

I'm continuing to measure blisshq.com's speed and implement performance improvements on a gradual basis. Maybe a report of how these changes are affecting site performance would be a good future blog post!

Thanks to Hugo90 for the image above.
blog comments powered by Disqus