Music backup tools and techniques
August 07, 2011 in digital music by Dan Gravell
Back in 2002 I received the distressing news that my grandparents had been burgled. Aside from the emotional cost and the loss of various sentimental items their entire music collection of classical CDs, built up over twenty or so years, was stolen. Music collectors know that to lose a collection built up over such a long time can be a rather tragic experience; much thought goes into choosing recordings and many memories can be built from listening to them down the years.
It's not often cited, but this is another area where computer-stored digital music has an edge over older forms of music stored on physical media. The backup possibilities for physical media mean taking a copy of the music, which doubles the space required, or taking out home contents insurance, which does not guarantee that you will be able to purchase the same recording again. For computer-stored audio however, given its minimal and flexible physical storage requirements, creating backups of music is far easier and cheaper.
However, digital music still needs to be backed up. Physical media suffers from physical threats: burglary or physical damage. Digital media suffers from the same physical threats, plus threats inherent to digital media such as viruses, corruption or accidental deletion (flexibility often opens the possibility of human error).
Whatever you do, take backups. What should be considered when backing up music?
On-site or off-site?
On-site means the backups that are created of your music are stored in the same building as the music itself. Off-site means the backups are elsewhere, in a different building, city, country or even continent.
On-site backup is the traditional option. Typically, the backups are taken via a process that copies the backup onto separate media, such as DVD, tape or a removable external hard disk. This backup media is then stored elsewhere in the house. This is the minimum acceptable backup strategy. Storing your backups on the same computer or worse on the same hard drive still leaves open attack from an unacceptable number of threats, such as virus attack or hard disk crash.
Off-site backup normally means copying your backups to computer storage elsewhere in the world. Plenty of providers now offer this service, generic storage ones such as Amazon S3 or rysnc.net or specific backup services like Norton 360 or Mozy. The former tend to be cheaper. This reduces the number of threats to your data, but incurs an ongoing data storage cost and you will also, in the case of the generic services, have to work out how the data is transferred from your computer to the backup computers (most backup services come with application that do this automatically).
Lossless music collectors may find on-site backup is their only option. The size of lossless files, even when compressed using encodings such as Apple Lossless and FLAC, mean that uploading music as part of an off-site backup procedure is costly in terms of time and bandwidth.
Automatic or manual?
This one's pretty clear: automatic backups are better than manual ones. Automatic backups happen regularly without you having to do anything, so there's no chance of forgetting to take backups. The only real difficulty comes in working out how to implement automatic backups.
Automatic backups need ingenuity, and is a little more intrusive, when your backup process runs on a computer that is not running 24x7. If, for example, you turn your computer on for the first time in a week and all of your backups run, it may get in the way of whatever you turned your computer on for. This is a fundamental truth: for your backups to run, your computer must be on.
As, of course, must the destination of the backup. If you are taking on-site backups, this means your DVDs / tapes / USB drive must be connected and ready to take the backup. For a fully automated solution, this suggests something to receive the data that is itself either always running or run by the backup procedure at the alloted time.
Specialised backup software performs the task of scheduling backups, but if you write your own backup procedure then you can still used Scheduled Tasks on Windows to run your backups at predefined periods, or cron or anacron on Linux to do the same. Use Time Machine on Macs.
How often?
The answer to this question comes down to how often you change your music collection. This could mean either adding to it or changing it in some way, such as adding album art. For most people, this is unlikely to be too often. You can probably get away with a once-per-week or once-per-month schedule.
A problem with frequent backups is that they potentially have high bandwidth (if off-site) and storage costs. These can be mitigated by taking incremental backups. Consider a backup taken on Monday, and one on Tuesday. If the one on Monday full describes the data, then the one on Tuesday could simply describe the difference between Monday's and Tuesday's collection. At the best case, if nothing was changed in your collection between Monday and Tuesday then Tuesday's backup has zero storage cost.
Incremental backups are used by grandparent, parent, child backups, an example of a type of backup model.
Music's next top (backup) model
The three main backup model's I've considered are mirror, batch and grandparent, parent, child.
Mirror is maybe the easiest to start off with. You simply copy your music files, as is, to the backup location. Restoring them is easy too: copy them back. It's easy to setup a scheduled task to do this.
Mirrors have low storage costs because they are just one copy of your music library. In the case of off-site backups, bandwidth can be expensive if you copy the entire library on each backup. Tools such as rsync will analyse the destination files and only copy the data that has actually changed, even down to individual byte differences within files.
The main disadvantage of mirrors are that they do not protect against certain types of threat. For instance, if a virus infects your music files or your music files suffer some corruption the same problems will be copied to your one-and-only backup. There's no way to go 'back in time'. Worse, if your backup is incremental and deletes files on the mirror, then if you accidentally delete music then the same deletion will occur in your mirror. Mirrors are better than nothing, but they are no insurance policy.
Batch backups group your library into an archive and copy that to the backup location. This means you may have many archives from previous backups.
The key advantage of this is that, with a longer history, you have more chance of restoring your library when an invasive yet not immediately noticeable issue occurs, like corruption or virus.
You need some way of pruning old archives, though. If your backups are on-site, this can be as easy as tossing the old backup CDs in the bin. For off-site backups, you need to build in some way of automating the pruning of too-old archives. This needs to be done with care, because if you are too ruthless you lose, to an extent, the advantage of backup history inherent with batch backups.
Batch backups are also costly in terms of bandwidth, if backing up off-site. This is because a new archive is created and copied each time. Remember to compress the archive, if possible
Grandparent, parent, child backups go some way to combining the low cost of bandwidth of incremental mirrors with the safety and security of batch backups. With this approach, you always have one grandparent, one parent and a number of child backups to use. This means, say, the gap between child and parent may be measured in days, while the gap between parent and grandparent in weeks.
First you decide your backup period. Let's say we backup once per week. Then we decide where our 'generations' fall. Let's say the parent backup occurs once per month and the grandparent once per year. On the same day in each year the grandparent is taken as a full backup and copied as an archive to the backup location. The same is done for the parent, on the same day each month. Finally, child backups are taken as incremental backups, incrementing from the parent. This means you only ever have two full backups, significantly saving bandwidth and storage costs.
The period when the generations cross can, of course, be at anytime. If a year is too infrequent, make it every three or six months.
Don't just sit there... do something!
Hopefully I convinced you that it's worth taking backups. All that investment in your music collection... it would be awful to see it lost.
Why not start with the simplest possible solution, maybe manually archiving your collection once per month to an external USB disk? Set a reminder to remind you to do the backups. If you get tired of doing it, look into automated solutions.
If you've had success backing up your music, post your tools and techniques in the comments below!
Thanks to godog, swanksalot, Joe Lanman and El Gran Dee for the images above.