Best way to shrink a Cassandra cluster

If you just shut the nodes down and rebalance cluster, you risk losing some data, that exist only on removed nodes and hasn't replicated yet.

Safe cluster shrink can be easily done with nodetool. At first, run:

nodetool drain

... on the node removed, to stop accepting writes and flush memtables, then:

nodetool decommission

To move node's data to other nodes, and then shut the node down, and run on some other node:

nodetool removetoken

... to remove the node from the cluster completely. The detailed documentation might be found here: http://wiki.apache.org/cassandra/NodeTool

From my experience, I'd recommend to remove nodes one-by-one, not in batches. It takes more time, but much more safe in case of network outages or hardware failures.


When you remove nodes you may have to re-balance the cluster, moving some nodes to a new token. In a planed downscale, you need to:

1 - minimize the number of moves.

2 - if you have to move a node, minimize the amount of transfered data.

There's an article about cluster balancing that may be helpful: Balancing Your Cassandra Cluster

Also, the begining of this video is about add node and remove node operations and best strategies to minimize the cluster impact in each of these operations.

Hopefully, these 2 references will give you enough information to plan your downscale.

Tags:

Cassandra