In Mongo what is the difference between sharding and replication?

Consider you have a great music collection on your hard disk, you store the music in logical order based on year of release in different folders. You are concerned that your collection will be lost if drive fails. So you get a new disk and occasionally copy the entire collection keeping the same folder structure.

Sharding >> Keeping your music files in different folders

Replication >> Syncing your collection to other drives


In the context of scaling MongoDB:

  • replication creates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest.

  • sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key. For example, a poor choice of shard key could lead to "hot spots" of data only being written on a single shard.

A sharded environment does add more complexity because MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.

Replication and sharding are typically combined to created a sharded cluster where each shard is supported by a replica set.

From a client application point of view you also have some control in relation to the replication/sharding interaction, in particular:

  • Read preferences
  • Write concerns

Tags:

Mongodb