How to backup large MongoDB database

With the need of 10TB to be backed up this gets a bit complicated.

Replicas are no replacement for proper backups

While delayed replica set members can provide an relatively easy way to help you with accidental operations, there are no replacement for proper backups, very much like RAID isn't a replacement for file system based backups.

Recommendations

That heavily depends on how your setup looks like.

SAN snapshots

With 10TB, I assume you have some sort of SAN attached. The easiest way for backing up MongoDB in those environments is to make sure you have journaling activated both on the filesystem and MongoDB and simply take a snapshot on of the SAN volume of one of the secondaries, propably a hidden one to make sure your operations don't get interrupted. This usually takes mere seconds, but please _make sure_that your replication oplog window is sufficient. Otherwise, you might need to resync the secondary.

Do not use mongodump

I have to disagree with RolandoMySQLDBA about the usage of mongodump. First of all, it imposes locks on the server. Though they are lifted relatively fast, the shere number of locks might add up and interfere with your operations, unless run on a hidden node or when there is no read preference hitting the secondaries. Plus, it is not exactly fast. I'd expect it to run for hours, at least, most likely taking longer than your backup window. Side note: Always run mongodump with the --oplog option. Also keep in mind that mongodump does not backup indices, but the operations to create indices. Those indices have to be recreated during a restore, which may massively increase the time you need for it. From my experience, if you have to restore a database, you want to have it as fast as possible. Another point why mongodump isn't suited for backing up 10TB.

Notes on LVM snapshots

You can do an LVM snapshot on a running mongod instance provided that you have journaling enabled in mongod (and from my experience, it does not hurt to have it enabled on the FS level, too). However, LVM snapshots come with some implications. First, you obviously need to have enough disk space that can take the changes during the backup operations. Let me clarify that.

Let's assume you have an hourly change rate of 500GB. And that you want to have your backup blipped before it is uploaded to some storage. Even when using parallel bzip2, the compression of 10TB would need some hours to finish, simply because the fact that most likely you mass storage throughput would become your limiting factor. Let's assume it would take 2 hours to compress the data to 2TB. So by now we would need some 2TB + 2*500GB of free disk space total, 1TB needed for the LVM snapshot. This would create the need of over provisioning your filesystem by at least 30%. In case you want to have a proper safety margin, this could easily increase to 60-70% (20% for a utilization factor of 0.8 for the original file system, the same for the snapshot size plus the space needed for the bzipped backup itself). In most production environments, that would be inacceptable, since that over provisioning would be static (You would not want a backup script to mangle with your LVM dynamically, would you?).

MMS backup

While MMS backup has some awesome features (continuous backup, easy point in time recovery), it comes with some serious drawback: its's price tag for large deployments can easily be in the thousands. With an assumed hourly change rate of 500GB on those 10TB, it would be a medium six-figure sum for cloud backups. Monthly.

My suggestion he would be to take an enterprise subscription for your servers for being eligible to have an on premise MMS instance, including backup.

Summary

Here are the options I would take in descending order of preference.

  1. SAN snapshots: easy to implement, relatively cheap
  2. Enterprise subscription: Best features. Install it, configure it, forget it, it's there when you need it
  3. LVM snapshots: easy to implement, but the costs of the necessary over provisioning may sum up over time.

There are two options

PHYSICAL BACKUP

If you don't mind downtime, simplest thing to do is

service mongod stop

Do an LVM snapshot or a brute force cp of the Mongo data folder to another disk

service mongod start

Of course, you don't want downtime if the 10TB of data is on a standalone machine.

DELAYED REPLICA SET

If you have a replica set with three nodes, use one of the nodes for backups

{
        "_id" : "myreplica",
        "version" : 1,
        "members" : [
                {
                        "_id" : 1,
                        "host" : "10.20.30.40:27017",
                        "priority" : 2
                },
                {
                        "_id" : 2,
                        "host" : "10.20.30.41:27017"
                },
                {
                        "_id" : 3,
                        "host" : "10.20.30.42:27017",
                        "priority" : 0,
                        "slaveDelay" : 3600
                }
        ]
}

Use the node with "_id' : 3 all your physical backups. Therefore, no downtime. To get a midnight snapshot, you could launch the backup at 1:00 AM since the the hidden node is 1 hour behind.

Of course, the drawback is to have two more servers with 10TB each and the sysadmin's sanity at risk.

MONGODUMP

You could use mongodump against the standalone machine but you must expect performance degradation since mongodump is a client program using a connection like any other connection.

If you want point-in-time backup, you should use

mongodump --oplog 

The logical BSON backup will be smaller (especially gzipped or bzipped) than the physical backup.

Using mongodump --oplog would best be done against the hidden node. That way, there is no performance hit on the Master.

DISCLAIMER

I am relatively new to MongoDB (accidental/incidental MongoDBA). I hope my answer helps.