Mongodump affects app performance really bad

All data dumped via mongodump has to be read into memory by the MongoDB server. It's also worth noting that mongodump backs up data and index definitions; the time to restore can also be significantly longer as compared to other approaches since mongorestore will need to recreate any secondary indexes after the data is loaded.

As noted in the MongoDB documentation, mongodump is useful for backing up and restoring small deployments but is not ideal for capturing full backups of larger systems:

When connected to a MongoDB instance, mongodump can adversely affect mongod performance. If your data is larger than system memory, the queries will push the working set out of memory, causing page faults.

A standalone server limits your backup options if you also want to keep your deployment available while taking a backup.

Here are a few suggested approaches in order of most to least recommended:

Approach #1: Use a cloud backup service

For the easiest short term solution, I would consider using a commercial cloud backup service like MongoDB Cloud Manager. MongoDB Cloud Manager provides continuous backup with scheduled snapshots and a retention policy (see Backup Preparations for more info). A cloud service also avoids you having to deploy any extra servers/infrastructure, so even if you plan to do so in future this is a helpful short-term solution.

The general approach would be:

  • Convert your standalone server into a single-node replica set (i.e. restart with the replSet parameter and run rs.initiate()).
  • Sign up for MongoDB Cloud Manager.
  • Download & install the Cloud Manager Backup Agent.

As an added benefit, Cloud Manager also includes a monitoring agent which can capture metrics history from your deployment and allow you to configure alerts.

Approach #2: Convert your deployment into a replica set and backup from a hidden secondary

This approach requires provisioning some extra infrastructure, but offloads the impact of backup from your primary server. Typically replica sets are provisioned with at least three members for high availability and automatic failover, but if your only goal is backup you can use a less ideal two server configuration.

The general approach would be:

  • Provision a second server which will be used for backup
  • Convert your standalone server into a replica set.
  • Add your backup server as a hidden secondary with a priority of 0 (it will never become primary) and 0 votes.
  • Use one of the supported backup methods to take backups on your hidden secondary. The backup methods are listed in general order of recommendation: filesystem snapshots (if supported by your configuration) or file copy (assuming you stop mongod) are preferable to mongodump.
  • (ideally) add another data-bearing secondary if you'd like the high availability & failover benefits of a replica set configuration.

Approach #3: Use filesystem snapshots (if available & appropriate)

A less impactful backup strategy than your current mongodump would be using filesystem snapshots, assuming you have a filesystem that supports snapshots (and all of your data and journal files are on a single volume so you can get a consistent snapshot of a running mongod). The upside of filesystem snapshots is that all data does not have to be read into memory by mongod, however snapshots can still have impact (particularly when creating the initial snapshot on a busy system). Successive snapshots are more efficient and less impactful, but are still not a complete backup solution as the snapshots are local to your server (and you only have a standalone at the moment).

Caveats

  • Approaches #1 and #2 both involve enabling replication to faciliate backups. Replication will add some additional local I/O on your primary server as all write operations are noted in a special capped collection called the oplog (operations log).

  • You've mentioned a likely need for sharding in future, but before doing so I would isolate your MongoDB workload from the other processes sharing the same server. If you can change your backup strategy to something more efficient than mongodump, remove resource contention, and capture some baseline metric history for review ... you may find that sharding is not required yet.


I'm late to the party but encountered the same problem only recently on VM's with relatively small amount of RAM (4 GB RAM, 50 GB HD, 5 GB data). Our workaround is to use mongodump's option --forceTableScan and, if secondaries should be used, adding also --readPreference secondary. That sped up our dump by factor 10 to 30.

Tags:

Backup

Mongodb