Understanding kafka watermark strategy assigner

Nikola Hrusov Sun, 08 Nov 2020 11:14:18 -0800

Hi,

I am reading about the watermark creation of the kafka streams using the
article here:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-and-timestamp-extractionwatermark-emission


In there, it is a given example where the watermark assigner is directly
attached to the consumer like so (solution 1):

FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("topic",
> new SimpleStringSchema(), properties);
>
> myConsumer.assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(20)));
> env.addSource(myConsumer)....


Then we can use that by adding it as a source and continue with the
application.

My question is, would that have any/much difference against doing it after
the source? Something like this (solution 2):


> FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("topic",
> new SimpleStringSchema(), properties);
> env

.addSource(myConsumer)

.assignTimestampsAndWatermarks(WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(20)));
>

I can eventually think that it would create an extra operator, but is there
any other [unnecessary] overhead that solution 2 will give over solution 1?
I tried running a simple job, but I couldn't see much difference. I would
like to know if there is something I am unaware of and I can do better.

Regards
,
Nikola Hrusov

Understanding kafka watermark strategy assigner

Reply via email to