Cassandra: Snitch vs. Gossip

Gossip is a protocol and Snitch is a component which utilizes it. Snitch is a little bit more than gossip and it has at least some heuristics like identifying data centers or racks while gossip is like a convenient tool to get this information. Almost all that gossip is doing is spreading arround with some rules to cover all necessary nodes and receive some technical data like ip, health etc. While Snitch utilizes this info to perform something more. One of its features is to identify different data centers by analyzing received ips. Then this info is used by other components for further actions like replicas location etc. So they've decided to give this functionality separate name to identify it and actually it's all about layering the functionality.

Some relevant information also can be found here: https://books.google.ru/books?id=h36CCwAAQBAJ&pg=PT21&lpg=PT21&dq=snitch+gossip&source=bl&ots=fjxy_z78Gj&sig=KpqdkKaREIo2YAWyJj3yMZCyNn4&hl=ru&sa=X&ved=0ahUKEwiUktS8q8zWAhWIQZoKHTViD0U4ChDoAQhUMAc#v=onepage&q=snitch%20gossip&f=false

And here is a more detailed snitch definition (but in scylla): https://github.com/scylladb/scylla/wiki/Snitches


Gossip is used to identify the state of machines (are they in the cluster, up/down/joining/leaving).

The snitches help map ownership to an actual machine, and route queries (given these 10 nodes in the cluster, which of the 10 own the data for a given key).

Different snitches can help assign data in different ways - the simple snitch just places all instances into datacenter1/rack1, and uses the simple distributed hashtable / naive partitioner placement. The property file snitch lets you create a file that has all of the instances, and maps the instance to a datacenter/rack, ensuring that replicas always exist on different racks (and datacenters, as defined by the replication strategy).

The gossiping-property-file-snitch and the ec2 snitches are somewhat like the property file snitch in that they're rack/topology aware, but they read the local instance topology information (either from a file or from the ec2 apis) and then gossip it to others, so each node is responsible for broadcasting its own topology information (through gossip).

Tags:

Cassandra