How to achieve delayed queue with apache kafka?

There is no notion of jobs in Kafka. It is just a dumb high performance message queueing service. Depending on your requirements you may consider storing the jobs in a storage that supports indexing by job execution time like some RDBMS. Then in some process periodically extract the jobs with execution times in some small range [last_check_time, current_time+lookahead_interval] and put them into a Kafka topic for eventual processing.


A bit of a delayed answer here. It's now possible in the latest Kafka version 0.10+ to consume from a delayed stream, using the new timestamp per message. I'm using this right now in order to implement a continuous aggregating dataset, without resorting to external dependencies.

These records come through, and may have updates/deletes coming through within the next 60 minutes after the first event, so I can't declare one as "final" until I have seen all the updates.

So, to handle this case, I'm consuming the topic with all CREATEs/UPDATEs/DELETEs twice, the first one in realtime (or as fast as possible), the second one delayed by 90 mins to ensure I don't miss anything. On the realtime consumer, I'm storing locally all the needed updates for the create. Then on the delayed consumer, when I receive a particular "CREATE", I'll go lookup my local storage for any updates/deletes, update the record so it knows it's final status, and produce it into a final topic into Kafka again.

To ensure I don't run out of disk space, I'm also continuously truncating the local storage so it holds at most two hours of updates/deletes.


Unfortunately, Kafka does not have the ability to delay the visibility of messages like some message queues do. Once a message is published, it will be immediately made available to all consumers. The only minor exception to this is when publishing occurs in transaction scope, and the consumer has enabled read-committed isolation mode. Even then, the delay will be minimal.

Kafka leaves all processing semantics to the consumers’ discretion. If you need to delay processing, you may want to use a persistent data store (e.g. an RDBMS or Redis) or another queue on the consumer end. You most certainly don’t want to block record consumption on the producer with a Thread.sleep(), because this will affect your ability to poll records and Kafka will eventually deem your consumer as having failed.