If you ever wanted to have scheduled remote backups but Amanda or Bacula were too complicated for your needs and just cron jobs with rsync were not enough, you should have a look at rsnapshot.
I started with using rsync over ssh to backup my servers and workstations to a central backup server.
rsync -vaz—del user@remote:/etc /backups/remote/etc
The problem with his approach is that with rsync it is easy to have a synchronized version of your live system but when you want to be able to access several version to the past it is getting complicated. You have to write shell scripts and cron jobs in order to have several versions of the live systems like backing up every system once a week and keeping up to 4 weekly backups. Then maybe you want to also keep 12 monthly backups and so on. Your scripts are getting bigger and bigger and disk space is starting to shrink.
rsnapshots is trying to solve these problems. It is a remote filesystem snapshot utility based on rsync. It can do all the things I described while keeping the needed disk space at a minimum through using hard links. Therefore if in your weekly backups of /etc only 3 file changed, only these files will be copied but the rest will also be accessible as hard links to the backup store where they last changed.
That means that I can have daily/weekly/monthly/yearly snapshots of whole filesystems but the used disk space will be minimal.
Configuration is very easy. All configuration is done on the backup server. This server will use rsync over ssh to connect to the servers that should be backed up and copy the given filesystem(-changes) to the backup store.
After installing rsnapshot (and rsync), edit rsnapshot.conf. First set where you want to store the backups and enable cross-plattform backup of special files (e.g. FIFOs):
snapshot_root /backup/snapshots/
link_dest 1
We are not done with the general configuration. Now we have to specify the intervals for backing up. The default of hourly/daily/weekly/monthly is reasonable and useful and will not hurt your disk space because of the hard links. For my demands, hourly is not needed, so I comment it out:
#interval hourly 6
interval daily 7
interval weekly 4
interval monthly 3
When you use your own intervals be sure to use the smallest first. Larger intervals use the smaller ones as a start. That means that after 4 weeks the oldest weekly snapshot will become the newest monthly backup. The numbers mean that the hourly snapshot will be taken 6 times a day (or every four hours), the daily snapshot will be taken 7 times a week and so on.
Now just tell rsnapshot what to backup and where to put it:
# localhost
backup /root/ localhost/root/
backup /etc/ localhost/etc/
backup /usr/local/etc/ localhost/usr_local_etc/
This will backup the directories /root, /etc, and /usr/local/etc to a directory named localhost in snapshot_root on the backup machine. For remote backups just give ssh the user and host:
# www.example.com
backup root@www.example.com:/root/ www.example.com/root/
backup root@www.example.com:/etc/ www.example.com/etc/
backup root@www.example.com:/usr/local/etc/ www.example.com/usr_local_etc/
backup root@www.example.com:/usr/local/www/ www.example.com/usr_local_www/
backup root@www.example.com:/usr/local/home/ www.example.com/home/
The use the hostname of the remote host as the directory name on the local machine is a often used convention.
For this to work, the user root must be able to log in to www.example.com with a RSA or DSA authentication key. Use ssh-keygen to create these or read here for more information on how to do this. Further root has to be able to read all the directories on the remote server. This is why we start with root.
But using root for this work is not secure because the ssh-key can not be protected by a passphrase as it is used automatically in the background. Therefore create a backup-user on the remote host for this task and allow him to use sudo rsync without a password in /etc/sudoers. Really paranoid can then also allow only this command in .ssh/autorizedkeys. See the rsnapshot mailling list for more information on how to do this.
rsnapshot can also run scripts and backup their output. This is useful for backup of databases, e.g. the script runs pg_dump and rsnapshot backups the dump. Sadly this can only be done for local systems, on remote systems you need a cron job that dumps the database and then rsnapshot can backup the directory with the dump.
We are now nearly finished, the only thing left is to add the cron jobs that call rsnapshot. The daily cron job has to run once a day and just call
rsnapshot daily
The weekly cron job has to run one a week and call
rsnapshot weekly
You get the picture, just call rsnapshot and tell it which interval to back up. On FreeBSD there are already the directories /etc/periodic/{daily, weekly, monthly}. Just insert your script there and it will just work.
After running rsnapshot for some weeks, your backup directory should look like this:
$ cd /backup/snapshots/
$ ls -l
... daily.0
... daily.1
... daily.2
... daily.3
... daily.4
... daily.5
... daily.6
... weekly.0
... weekly.1
... weekly.2
... weekly.3
... monthly.0
$ ls -l daily.0
... localhost
... www.example.com
$ ls-l daily.0/localhost
... etc
... root
... usr_local_etc
I can so access snapshots of whole filesystems of all my servers and workstations several weeks and months to the past without wasting disk space. And the configuration took me only 15 minutes!
See the official HOWTO for a more detailed guide and information on giving normal users access to the snapshots through Samba of NFS.
UPDATE:
See my follow-up article for more information on how to use another user than root for the backups.
