Hi Mason, Very interesting, is it possible to apply both types of alignment? I.e., considering watermark skew across splits from within one source & also from another source?
Regards, Alexis. On Tue, 28 Feb 2023, 05:26 Mason Chen, <mas.chen6...@gmail.com> wrote: > Hi all, > > It's true that the problem can be handled by caching records in state. > However, there is an alternative using `watermark alignment` with Flink > 1.15+ [1] which does the desired synchronization that you described while > reducing the size of state from the former approach. > > To use this with two topics of different speeds, you would need to define > two Kafka sources, each corresponding to a topic. This limitation is > documented in [1]. This limitation is resolved in Flink 1.17 by split level > (partition level in the case of Kafka) watermark alignment, so one Kafka > source reading various topics can align on the partitions of the different > topics. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_ > > Best, > Mason > > On Mon, Feb 27, 2023 at 8:11 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> I had this question myself and I've seen it a few times, the answer is >> always the same, there's currently no official way to handle it without >> state. >> >> Regards, >> Alexis. >> >> On Mon, 27 Feb 2023, 14:09 Remigiusz Janeczek, <capi...@gmail.com> wrote: >> >>> Hi, >>> >>> How to handle a case where one of the Kafka topics used for interval >>> join is slower than the other? (Or a case where one topic lags behind) >>> Is there a way to stop consuming from the fast topic and wait for the >>> slow one to catch up? I want to avoid running out of memory (or keeping a >>> very large state) and I don't want to discard any data from the fast topic >>> until a watermark from the slow topic allows that. >>> >>> Best Regards >>> >>