Redis replication and redis sharding (cluster) difference

Sharding is almost replication's antithesis, though they are orthogonal concepts and work well together.

Sharding, also known as partitioning, is splitting the data up by key; While replication, also known as mirroring, is to copy all data.

Sharding is useful to increase performance, reducing the hit and memory load on any one resource. Replication is useful for getting a high availability of reads. If you read from multiple replicas, you will also reduce the hit rate on all resources, but the memory requirement for all resources remains the same. It should be noted that, while you can write to a slave, replication is master->slave only. So you cannot scale writes this way.

Suppose you have the following tuples: [1:Apple], [2:Banana], [3:Cherry], [4:Durian] and we have two machines A and B. With Sharding, we might store keys 2,4 on machine A; and keys 1,3 on machine B. With Replication, we store keys 1,2,3,4 on machine A and 1,2,3,4 on machine B.

Sharding is typically implemented by performing a consistent hash upon the key. The above example was implemented with the following hash function h(x){return x%2==0?A:B}.

To combine the concepts, We might replicate each shard. In the above cases, all of the data (2,4) of machine A could be replicated on machine C and all of the data (1,3) of machine B could be replicated on machine D.

Any key-value store (of which Redis is only one example) supports sharding, though certain cross-key functions will no longer work. Redis supports replication out of the box.


In simple words, the fundamental difference between the two concepts is that Sharding is used to scale Writes while Replication is used to scale Reads. As Alex already mentioned, Replication is also one of the solutions to achieve HA.

Yes, they are both typically used together if you consider how shards can be replicated across nodes in a cluster.

With regard to your third question, instead of the RAM-flush option, it is a better idea to use the Redis Append Only File (AOF). At only a minor cost (in terms of write speed), you get a lot more reliability of your writes. It is quite like the mysql binary log. The 1 fsync/second is the recommended option to use.


Using Replication and Sharding Together

If you want both high availability and improved performance, both replication and sharding can be used together to provide this. With sharding, you will have two or more instances with particular data based on keys. You can then replicate each of these instances to produce a database that is both replicated and sharded, which will provide for reliability and speed at the same time!

enter image description here

Tags:

Redis