pascal jungblut posts

More on backups

I already wrote something about how I do backups of my important data. However, some things changed and as the article gets quite a lot of traffic, I wanted to give an update on the details.

A major change is that I use BitTorrent Sync only for syncing, not for backups. In fact I don’t use btsync any more but I’ll get to that. While btsync works pretty well to keep data in multiple places up to date, it is simply not designed to do good backups. Of course not — I knew that from the beginning. I thought snapshotting the data in various places would automatically get me nice backups on top of all the syncing. While technically true, restoring data is kind of tedious and there is no easy way to search for older versions of a file. It works, but it’s not really fun at all.

So what is better? Instead of btsync I switched to a newish software called Syncthing to sync regularly accessed/shared files. It is open source, written in Go, therefore relatively easy to deploy and you can host the “announce” server yourself. You’re also required to whitelist any nodes that may access your data which gives me a warm and fuzzy feeling. Although it is not yet 1.0, it already works really, really well. A downside is that there are currently no GUI clients, so you have to look on http://localhost:8080 if your data is in sync.

Arq: the cloud!

Well yeah, I kind of gave up — as my backups would ultimately end up in the cloud (via Amazon Glacier) anyway, I realized that it might be better to just backup to S3 and Glacier with all the meta data in the first place. Arq is an excellent and unobtrusive tool to do regular incremental backups from a Mac to Amazon Glacier. The encryption algorithm is open source so if the software breaks one day and doesn’t get updates you can still access your data. Glacier is so cheap ($0.01 per GB per month) that you don’t really need to worry about the storage cost. The downside is that Glacier needs roughly four hours to deliver a requested file. That’s why I still use Time Machine to do hourly backups — it’s fast. Arq obviously eats a lot of bandwidth, so it’s less usable with a bad internet connection. Another possible problem might be the relatively high CPU usage while doing the backup. It’s one of the few programs that makes the fan of my laptop spin up.

Tarsnap

For servers there exist an even simpler tool. Tarsnap is developed by Colin Percival who was the Security Officer of the FreeBSD project. Don’t be intimidated by the old school look of the website. It is made by a nerd for nerds. The website contains more technical information than all of the competitors’ websites combined. I like it.

As the name suggests, Tarsnap is just like tar but instead of writing to files, it writes backups to Amazon S3. Not only that, it does some really cool cryptographic tricks to allow deduplication and heavy compression on your already encrypted data. For example: I backup the whole /usr directory of my personal server with lots of jails. Tarsnap compresses and deduplicates the data so that only 14GB of originally 35GB must be transferred and stored. The storage is notably more expensive (250 picodollars per byte month or $0.25 per GB per month) as it is stored in S3 and not in Glacier. But that also means that there’s no delay for receiving backups.

I use the cron script from Tim Bishop to make daily, weekly and monthly backups. It works so well that I’m tempted to replace Arq with Tarsnap on my workstations. I really like the simplicity and the unixesque feel. At the same time the GUI of Arq is a big plus — you want to see what’s going on on your workstation. Storing that much data on S3 might get expensive pretty soon.

However, how cool is it to do a fully incremental, deduplicated, save, secure, compressed offsite backup with:

# tarsnap -cf mybackup /usr

It just works. You can’t beat that.