Hello, According to the docs, Kafka Streams tasks pick from the partition with the smallest timestamp to process the next record. ( https://kafka.apache.org/documentation/streams/core-concepts#streams_out_of_ordering ) One can also configure max.task.idle.ms so that Kafka Streams tasks wait for all partitions to contain some buffered data before picking next record.
I wonder, is it possible to make it so that consumption from selected subset of topics (and their partitions) is kept "behind" by defined amount of time (several seconds or several minutes)? Example: lets have topic A and B, both with one partition. I want to consume from A if timestamp(A) < timestamp(B) + 10s and from B if timestamp(B) < timestamp(A) - 10s. Scenario: I am asking because I have the following scenario. There is 1 + X input topics. Let's say the first input topic is Alpha and the other topics Beta. Alpha records are aggregated into KTable. Beta records are joined with aggregated Alpha data (stream-table join by key) and sent to another output topic. For business problem, it is important to process Alpha records before Beta records (based on timestamps), at least in the great majority of cases. In other words, a Beta event should be enriched with aggregated data from all Alpha events that happened before the Beta event. Events can be delayed on its way to Kafka and might by delayed also during processing in Kafka Streams - in actual setup I need to map key and re-partition topics. So far, I am using just Kafka Streams DSL. I might be able to use .transform() and TransformerSupplier from Processor API and construct new timestamps for Beta events - shifting the timestamp to future by a given amount of time. I guess, I could do it after mapping the key and before join, so the re-partitioned records have the modified timestamp. Then, if Kafka Streams' "pick from the partition with the smallest timestamp" is respected also for internal topics, it should work fine. Would you consider it a good approach? The "delay" would incur latency to Beta events processing. It wouldn't be trouble if delay is small. Otherwise, I guess ,I would need to use some kind of windowing. Although, windowing isn't supported by stream-table join, so that might be a challenge. Do you know if stream-table join windowing might be supported in a future version of Kafka Streams? Best Regards, Jiri Samek