Kafka10.1 heartbeat.interval.ms, session.timeout.ms and max.poll.interval.ms

session.timeout.ms is closely related to heartbeat.interval.ms.

heartbeat.interval.ms controls how frequently the KafkaConsumer poll() method will send a heartbeat to the group coordinator, whereas session.timeout.ms controls how long a consumer can go without sending a heartbeat.

Therefore, those two properties are typically modified together. heatbeat.interval.ms must be lower than session.timeout.ms, and is usually set to one-third of the timeout value. So if session.timeout.ms is 3 seconds, heartbeat.interval.ms should be 1 second.

max.poll.interval.ms - The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member


just make them more clear, a heartbeat thread(along with user thread which invokes Poll function in the same process) will send heartbeat to coordinator every "heartbeat.interval.ms" time, and the coordinator will mark the consumer in the user thread as dead if it exceeds "session.timeout.ms" or "max.poll.interval.ms".


Assuming we are talking about Kafka 0.10.1.0 or upwards where each consumer instance employs two threads to function. One is user thread from which poll is called; the other is heartbeat thread that specially takes care of heartbeat things.

session.timeout.ms is for heartbeat thread. If coordinator fails to get any heartbeat from a consumer before this time interval elapsed, it marks consumer as failed and triggers a new round of rebalance.

max.poll.interval.ms is for user thread. If message processing logic is too heavy to cost larger than this time interval, coordinator explicitly have the consumer leave the group and also triggers a new round of rebalance.

heartbeat.interval.ms is used to have other healthy consumers aware of the rebalance much faster. If coordinator triggers a rebalance, other consumers will only know of this by receiving the heartbeat response with REBALANCE_IN_PROGRESS exception encapsulated. Quicker the heartbeat request is sent, faster the consumer knows it needs to rejoin the group.

Suggested values:
session.timeout.ms : a relatively low value, 10 seconds for instance.
max.poll.interval.ms: based on your processing requirements
heartbeat.interval.ms: a relatively low value, better 1/3 of the session.timeout.ms