Flink Dynamic Table vs Kafka Stream Ktable?

Flink's dynamic table and Kafka's KTable are not the same.

In Flink, a dynamic table is a very generic and broad concept, namely a table that evolves over time. This includes arbitrary changes (INSERT, DELETE, UPDATE). A dynamic table does not need a primary key or unique attribute, but it might have one.

  • A KStream is a special type of dynamic table, namely a dynamic table that is only receiving INSERT changes, i.e., an ever-growing, append-only table.

  • A KTable is another type of dynamic table, i.e., a dynamic table that has a unique key and changes with INSERT, DELETE, and UPDATE changes on the key.

Flink supports the following types of joins on dynamic tables. Note that the references to Kafka's joins might not be 100% accurate (happy to fix errors!).

  • Time-windowed joins should correspond to KSQL's KStream-KStream joins
  • Temporal table joins are similar to KSQL's KStream-KTable joins. The temporal relation between both tables needs to be explicitly specified in the query to be able to run the same query with identical semantics on batch/offline data.
  • Regular joins are more generic than KSQL's KTable-KTable joins because they don't require the input tables to have unique keys. Moreover, Flink does not distinguish between primary- or foreign-key joins, but requires that joins are equi-joins, i.e., have at least one equality predicate. At this point, the streaming SQL planner does not support broadcast-forward joins (which I believe should roughly correspond to KTable-GlobalKTable joins).

I am not 100% sure because I don't know all the details of Flink's "dynamic table" concept, but it seems to me it's the same as a KTable in Kafka Streams.

However, there is a difference between a KTable and a GlobalKTable in Kafka Streams, and both are not the exact same thing. (1) A KTable is distributed/sharded while a GlobalKTable is replicated/broadcasted. (2) A KTable is event time synchronized while a GlobalKTable is not. For the same reason, a GlobalKTable is fully loaded/bootstrapped on startup while a KTable is updated based on the changelog records event timestamps when appropriate (in relationship to the event timestamps of other input streams). Furthermore, during processing updates to a KTable are event time synchronized while updates to a GlobalKTable are not (ie, they are applied immediately and thus can be considered non-deterministic).

Last note: Kafka Streams adds foreign-key KTable-KTable joins in upcoming 2.4 release. There is also a ticket to add KTable-GlobalKTabel joins but this feature was not requested very often yet, and thus not added yet: https://issues.apache.org/jira/browse/KAFKA-4628