A Django Management Command for Amazon S3

The Django Commands Extension project, started by Michael Trier, provides some excellent general purpose management commands to your Django projects. Ever since I first found the project I've kept it in the back of my mind that if I ever wrote a useful custom management command, I should contribute it to the project. Recently, I had the opportunity to write such a command and submit it to the project and I hope it's useful for more people than just myself.

The command I submitted is ./manage.py sync_media_s3 and it copies all the files and directory structure found in your Django project's MEDIA_ROOT, as defined in the settings.py file, up to the Amazon Simple Storage Service, or simply, Amazon S3. Many people are now using Amazon S3 as a Content Delivery Network (CDN) for their websites, and Amazon has announced a beta service to support this type of usage in the future. The command also adopts a couple of the best practices from Yahoo for speeding up your websites -- namely, gzipping components like Javascript and CSS, and adding an expires header. Note, however, that if you add a far future expires header, you also need a way to perform "cache busting" to get the browser to pull down newer copies of those files. One project that can help with this is Django Compress, which not only helps make fewer HTTP requests by combining multiple CSS and Javascript files, but also allows for a version string to be added to your media files.

You can take a look at the full source of the management command and I'll explain some of the options below.

# See the help output
./manage.py sync_media_s3 --help

# Upload all files in MEDIA_ROOT to S3 in the bucket named 'mybucket'
./manage.py sync_media_s3 mybucket

By default, the sync script will compare the last modified times of the files and only upload files that have been modified more recently that the files that are on Amazon S3. If you'd like to override this you can use the --force option:

# Force all files to be uploaded regardless of last modified timestamps
./manage.py sync_media_s3 mybucket --force

You can opt in to a few extra features, namely, whether the script compresses Javascript and stylesheets or adds a far future expires header. Both of these options are disabled by default.

# gzip CSS and JS files and add the Content-Encoding header
./manage.py sync_media_s3 mybucket --gzip

# set a far future expires header. expiry is today plus two years
./manage.py sync_media_s3 mybucket --expires

# any of the options can be combined
./manage.py sync_media_s3 mybucket --force --expires --gzip

There are some things that I'd like for this script to do:

  1. The name sync_media_s3 is a bit of a misnomer as the script doesn't synchronize both ways, only up to S3. It also doesn't yet keep track of files that are currently on S3 but not found in MEDIA_ROOT and delete those files. I'd like the deletion to optionally happen so you can essentially mirror the folder to S3 easily, even as files change and get renamed.
  2. The script does have a list of files that it will skip. That list currently contains one filename of .DS_Store. I hope to make this configurable in the future.
  3. I'd like to make the far future expires header configurable in how far in the future the expiration is set.

If you find this script useful and think of features you'd like to see, or if you find bugs with it, either leave a comment below or post a feature request or bug on the project's issue tracker.

About this entry

Date Posted:
October 29th 2008 at 11:10:51 AM

Tagged:
aws, django, management command, s3

Previous Entry:
Introducing the Django Debug Toolbar

Next Entry:
The 56, 5 Book Meme