How to decrease number partitions Kafka topic?

Apache Kafka doesn't support decreasing the partition number. You should see the topic as a whole and the partitions are a way for scaling out improving performance. So all data sent to topic flow to all partitions and removing one of them means data loss.


You can't just delete a partition because that would lead to data loss and also the remaining data's keys would not be distributed correctly so new messages would not get directed to the same partitions as old existing messages with the same key.

For the above reasons Kafka does not support decreasing partition counts on an existing topic.

What you can do is to create a new topic with 3 partitions and then write an small program (or use an existing replication tool) to copy the data from the old 4 partition topic to the new 3 partition topic. That way you will be running everything through the same partitioner and all your keyed messages will end up in the right partition. Once you are satisfied the data is all copied then delete the original 4 partition topic.

If you must retain the same topic name as the original topic then just create a new topic with the original name, copy the data back from the repartitioned topic, and delete that temporary repartitioning topic.


I don't buy all the above answers. "Remove a partition causes data loss" is a vague answer. Decreasing partition numbers is not a new thing in the distributed system and in fact many systems support it. If you can afford the overhead of rebalancing the entire storage system while keeping the consistency of the data, decreasing partition is not an impossible thing to do.

In my opinion, the true reason Kafka doesn't support decreasing the partition number is due to an important property of Kafka: Kafka guarantees the order of the message within each partition but the order of the message between the partition is not guaranteed (but it's possible). This ordering property is crucial in many use cases. In the cause of removing one of the partitions, redistributing messages in the removed partition to other partitions while preserving the order is impossible because ordering between partitions is not guaranteed. No matter how you distribute the data in the removed partition, you will break the order guarantee properties of any partition you distribute into. If Kafka doesn't care about the order of messages within each partition, decreasing the partition number can easily be supported.

Tags:

Apache Kafka