How I backup
The day you lose important data due to a head crash of your hard disk, you start getting a little paranoid about backups. It’s not only important to have backups at all, they should also be the right ones. After some experimenting I’ve found a setup that works for my needs. So here’s how I backup my workstations and servers.
I use Macs and therefore OS X. The obvious choice here is OS X’s build-in mechanism: Time Machine. Although it has its quirks and the data format is proprietary, I do like it for two reasons:
- it’s built-in and simple to use. Need an older version of a file? Just open Time Machine, scroll through the versions, restore it and you’re good.
- since OS X 10.8 it can do seamingless backups to multiple destinations
One destination is a firewire drive that’s attached to the workstation, the second is a NAS, running FreeBSD with netatalk. The notebook only backups to the NAS (OS X makes local “backups” when I’m on the go and syncs these when I’m back home). That covers many of the “duh, I deleted a random file”-cases. Handy and fast.
However, if both Macs light up in a fire and I had to use a Linux system, I’d have trouble accessing the data, since the format is proprietary. So I need a way to access really important data quickly. Bonus points if the data is accessible from anywhere in the world. Of course, simply rsyncing them once in a while works well — even incremental — and I did that for some time. But it gets messy when more than two computers are involved. What if there was a software that solves the syncing problem and the backup problem at once?
I wrote about BitTorrent Sync with EncFS already. In short it uses the BitTorrent protocol to securely sync files between N computers. Your personal Dropbox. You can even generate “read-only tokens” that allow read-only access for a client. Important documents reside inside an EncFS container, the resulting (encryted) files are shared via BitTorrent and mounted automatically with a little help from the OS X Keychain on the workstation/laptop. The neat part is that I can just spin up a BitTorrent Sync client anywhere in the world and — given the correct 20 byte token — it will magically sync the data. Additional clients run on the NAS and on a root server. So there exist 3 copies of a file seconds after it is created. Once my parents get faster internet connectivity I’ll hide a Raspberry Pi at their home, too, making it 4 copies. The root server will take hourly/daily/weekly/monthly snapshots of these files. See below.
As a software developer I create source code. I put everything with more than one file in a git repository that I will push to the root server.
Where possible I exclude everything from the backup that can be downloaded from the internet, like software, music, movies, etc.
Since current servers are really powerful and oversized for most of my tasks, I have all tasks running as jails on a FreeBSD server. The server uses ZFS which has some really nice features. One of these features is easy snapshotting. The server takes snapshots of the filesystem, keeping:
- hourly snapshots for the last 6 hours
- daily snapshots for the last 6 days
- weekly snapshots for the last 4 weeks
- monthly snapshots for the last 4 months
And it doesn’t only do that for the root filesystem but individually for each jail, making a rollback really easy. One of the jails is running btsync and stores a copy of the data from the Macs I mentioned earlier. The ZFS pool is mirrored on two hard disks, usual for servers.
ZFS provides a nice command,
zfs send, that simply sends a snapshot to stdout. I use
zfs send | gzip | openssl -e ... > /my/snapshot.gz.enc to create encrypted files of the monthly snapshots and the incremental daily diffs. Those files are shared via btsync (yes, I love that tool). Currently only the NAS at home syncs these snapshots. They are additionally sent to Amazon Glacier monthly as the last resort. I hope I’ll never have to use it.