How to store and preserve lots of data?

The simple answer is: multiple copies. Whatever else you do, don't trust any single media, location or service.

Personally, I currently use external (USB-connected) hard disks for backup purposes. A 2-3 TB drive can be sourced quite cheaply and will provide you with plenty of storage capacity not just for the time being but also account for any reasonable future needs. At the physical size of roughly a large paperback book, it will easily hold the content of 200-300 movie DVDs without further compression. Buy two, or three if you are paranoid, preferably one of which from a completely different manufacturer (might want to mix Seagate and Western Digital, for example, since they are unlikely to use disks with the exact same design or manufacturing defects), and keep at least one of them in a physically separate location - a bank safety deposit box is a relatively cheap alternative that will give you physical security as well, but even just keeping one copy at work or at a friend's home will almost always work just as well. If you can arrange to be able to refresh an off-site copy without bringing it to your own location, that is even better. If any of the content is privacy-sensitive, keep that in mind when planning how to handle off-site copies.

Also keep in mind that the amount of data you are talking about (300 GB counting as more or less "irrecoverable", another 500 GB "nice to keep" but which in a pinch you could probably get from other sources such as second-hand movie DVDs) is not really all that much. I currently have a grand total of about 100 GB of digital photos alone, and it's not hard for me to add during a single day some 10-15 GB to that - and I have done that on a few occasions going to events where I have had reason to take lots of photos. Many of those photos are of in various ways questionable quality, many are mundane (nice to have, but in a pinch there's nothing truly special about them), but some of them actually are irreplacable from a content point of view as well as actually of good quality. For backup purposes, though, I treat them all the same way: multiple copies. I've had a few hard drives fail on me and while a few times I've lost data I would really have liked to keep, overall this strategy has meant that I can restore the most recent backup to a new drive and be on my merry way. If the live copy fails restore the backup to a replacement primary drive; if the backup drive fails, get a replacement backup drive and make a new backup.

If you do go the multiple storage media route, too, remember to keep checking each for signs of degredation. It's fairly quick and easy to do a SHA1 hash run over all the files on a drive and compare the results, as well as storing the list of hashes itself in multiple locations. That way, even if you get read errors at some point, you can determine which copy is "good".


You have to consider how valuable the data you're backing up is to you. I would split it into at least 2 categories:

  1. Irreplaceable
  2. Would like to keep but won't be heartbroken if the data is lost.

Strategy

For category 1: I would suggest one of the popular online storage systems (Amazon S3, Dropbox, etc). Here you're paying for someone to help you manage the backup, and ensure longer term access and easier access. Otherwise follow the suggestions for category 2 and ensure correct redundancy of data exists and care is taken. Assuming it's a smaller percentage of the 800 Gb total.

For category 2: it's your decision on how much to invest storage fees and time to upload data online. For that reason I would even suggest you use a large HDD to backup all the data, and store that drive disconnected from a PC, and just be aware it has a lifetime linked to a PC that supports current tech eg SATA. Then you can port the data to a new drive / new tech in the future. A 1 or 2TB HDD is reasonably well priced and will cover your data requirements now and into the short term.

Redundancy

Multiple drives with the same data would be your redundancy, which again can even be stored 'off site' if you're truly concerned about data safety.

Security

As an added feature if you're trying to protect the data from unauthorized access encrypt it locally before uploading online, and/or storing to HDD. Something like TrueCrypt will be ideal


First, think about the ways you might lose your data, and decide which you want to protect against. Some examples:

  • accidentally delete something
  • hard disk dies
  • a software or hardware bug
  • malware
  • theft
  • fire
  • natural disaster (fire, flood, quake, volcano, lightning, etc.)
  • government seizure

In my life, I've lost plenty of data, but only to the first two causes on that list.

Hard disks are compact, hold a lot of data, get bigger all the time, readily available, don't require special equipment to use, cheap, and getting cheaper (flooding in Thailand not withstanding).

I keep all my data one one drive; a second USB drive holds regular automated backups; a third identical USB drive sits offsite (at work, or in a safe deposit box is good). Monthly I carry the current backup drive to the offsite location, and bring the other drive back.

All storage media decay; the only way to be sure that your data is good is to use it. As part of that monthly routine, I pick an arbitrary file and restore if from my backup.