pascal jungblut posts

How I backup

The day you lose important data due to a head crash of your hard disk, you start getting a little paranoid about backups. It’s not only important to have backups at all, they should also be the right ones. After some experimenting I’ve found a setup that works for my needs. So here’s how I backup my workstations and servers.

Macs

I use Macs and therefore OS X. The obvious choice here is OS X’s build-in mechanism: Time Machine. Although it has its quirks and the data format is proprietary, I do like it for two reasons:

  • it’s built-in and simple to use. Need an older version of a file? Just open Time Machine, scroll through the versions, restore it and you’re good.
  • since OS X 10.8 it can do seamingless backups to multiple destinations

One destination is a firewire drive that’s attached to the workstation, the second is a NAS, running FreeBSD with netatalk. The notebook only backups to the NAS (OS X makes local “backups” when I’m on the go and syncs these when I’m back home). That covers many of the “duh, I deleted a random file”-cases. Handy and fast.

However, if both Macs light up in a fire and I had to use a Linux system, I’d have trouble accessing the data, since the format is proprietary. So I need a way to access really important data quickly. Bonus points if the data is accessible from anywhere in the world. Of course, simply rsyncing them once in a while works well — even incremental — and I did that for some time. But it gets messy when more than two computers are involved. What if there was a software that solves the syncing problem and the backup problem at once?

I wrote about BitTorrent Sync with EncFS already. In short it uses the BitTorrent protocol to securely sync files between N computers. Your personal Dropbox. You can even generate “read-only tokens” that allow read-only access for a client. Important documents reside inside an EncFS container, the resulting (encryted) files are shared via BitTorrent and mounted automatically with a little help from the OS X Keychain on the workstation/laptop. The neat part is that I can just spin up a BitTorrent Sync client anywhere in the world and — given the correct 20 byte token — it will magically sync the data. Additional clients run on the NAS and on a root server. So there exist 3 copies of a file seconds after it is created. Once my parents get faster internet connectivity I’ll hide a Raspberry Pi at their home, too, making it 4 copies. The root server will take hourly/daily/weekly/monthly snapshots of these files. See below.

As a software developer I create source code. I put everything with more than one file in a git repository that I will push to the root server.

Where possible I exclude everything from the backup that can be downloaded from the internet, like software, music, movies, etc.

Server

Since current servers are really powerful and oversized for most of my tasks, I have all tasks running as jails on a FreeBSD server. The server uses ZFS which has some really nice features. One of these features is easy snapshotting. The server takes snapshots of the filesystem, keeping:

  • hourly snapshots for the last 6 hours
  • daily snapshots for the last 6 days
  • weekly snapshots for the last 4 weeks
  • monthly snapshots for the last 4 months

And it doesn’t only do that for the root filesystem but individually for each jail, making a rollback really easy. One of the jails is running btsync and stores a copy of the data from the Macs I mentioned earlier. The ZFS pool is mirrored on two hard disks, usual for servers.

ZFS provides a nice command, zfs send, that simply sends a snapshot to stdout. I use zfs send | gzip | openssl -e ... > /my/snapshot.gz.enc to create encrypted files of the monthly snapshots and the incremental daily diffs. Those files are shared via btsync (yes, I love that tool). Currently only the NAS at home syncs these snapshots. They are additionally sent to Amazon Glacier monthly as the last resort. I hope I’ll never have to use it.