If i set 'compression.type' at topic level and producer level, which takes precedence

When the broker receives a compressed batch of messages from a producer:

  • it always decompresses the data in order to validate it
  • it considers the compression codec of the destination topic
    • if the compression codec of the destination topic is producer, or if the codecs of the batch and destination topic are the same, the broker takes the compressed batch from the client and writes it directly to the topic’s log file without recompressing the data.
    • Otherwise, the broker needs to re-compress the data to match the codec of the destination topic.

Decompression and re-compression can also happen if producers are running a version prior to 0.10 because offsets need to be overwritten, or if any other message format conversion is required.


I tried out some experiments to answer this:

**Note:** server.properties has the config compression.type=producer 

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=producer --topic t

./kafka-console-producer.sh --broker-list node:6667  --topic t
./kafka-console-producer.sh --broker-list node:6667  --topic t --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t

sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t-0/00000000000000000000.log

Dumping /kafka-logs/t-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: NONE 
offset: 1 position: 69 compresscodec: GZIP 
offset: 2 position: 158 compresscodec: NONE 

./kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1--config compression.type=gzip --topic t1

./kafka-console-producer.sh --broker-list node:6667  --topic t1
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec gzip
./kafka-console-producer.sh --broker-list node:6667  --topic t1 --compression-codec snappy

 sh kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --files /kafka-logs/t1-0/00000000000000000000.log
Dumping /kafka-logs/t1-0/00000000000000000000.log
Starting offset: 0
offset: 0 position: 0 compresscodec: GZIP 
offset: 1 position: 89 compresscodec: GZIP 
offset: 2 position: 178 compresscodec: GZIP 

Clearly the topic takes the override.

w.r.t compression and decompression text from Kafka - the definitive guide

The Kafka broker must decompress all message batches, however, in order to validate the checksum of the individual messages and assign offsets. It then needs to recompress the message batch in order to store it on disk.

As of version 0.10, there is a new message format that allows for relative offsets in a message batch. This means that newer producers will set relative offsets prior to sending the message batch, which allows the broker to skip recompression of the message batch.

So, when the compression type is different, the topic compression is honoured. if it is same, it will retain the original compression codec set by the producer.

Reference - https://kafka.apache.org/documentation/

Tags:

Apache Kafka