Do you have to run nodetool repair on every node?

In looking through the documentation, I am having a hard time finding a reference for this, but the short answer is "yes" you need to run nodetool repair on each node in your cluster. The closest I can find is the documentation on repairing nodes which suggests that you should not run repair on more than one node in your cluster at a time.

You can also run repair with the -pr flag, which limits the repair operation to the first token range that current node is responsible for. This cuts down on the duplication of work when run on the remaining nodes.


How it behaves depends on your configuration, what version of Cassandra you use, and how you run the repair command.

If you just run nodetool repair on a single node in a cluster it will repair all of the data (token ranges) that node is responsible for and also the other nodes that are responsible for that data.

So for example, if you were to run the nodetool repair command on a single node in a given cluster:

  • If you are running a three node cluster with a replication factor of three all nodes will own all of the data and thus repairs will be performed for all nodes.
  • If you are running a six node cluster with a replication factor of 2 the data will only be repaired on two of the six nodes. The repair will need to be initiated on two more of the remaining four nodes.

That said, it's possible to define what hosts and data centers to perform repairs on using the -hosts and -dc flags. Additionally if you use the -pr flag (which will only pick the first token range the node is responsible for) you will have to run nodetool repair -pr on all nodes in the cluster.

One other flag to keep in mind is the -inc flag, which was included in Cassandra 2.1. This option will only repair new data (data that hasn't been previously repaired). Be careful when relying on this, especially if you frequently delete data. (more on this)

Something else to keep in mind is that the default way repairs are done in Cassandra can vary. As of Cassandra 2.1 when running just nodetool repair it performs a full sequential repair by default. You will want to look up what your version does.

For more reading on the topic:

https://www.datastax.com/dev/blog/repair-in-cassandra


No, you don't have to run on each individual node. nodetool repair runs on set of nodes, which is clearly stated in the documentation.

You can limit the nodes or part of data on which you want to run repair. For example, you can provide -pr option for partitioner range, range for which node is responsible, but this will have to be run on whole cluster. But if you chose -local, then the nodes in local datacenter of node will get repaired.