Thanks for the answers, I still don't understand why I can see the offsets
being quickly committed to Kafka for the "small topic"? Are they committed
to Kafka before their watermark has passed on Flink's side? That would be
quite confusing.. Indeed when Flink handles the state/offsets internally,
the consumer offsets are committed to Kafka just for reference.

Otherwise, what you're saying sounds very good to me. The documentation
just doesn't explicitly say anything about how it works across topics.

On Kien's answer: "When you join multiple stream with different
watermarks", note that I'm not joining any topics myself, I get them as a
single stream from the Flink kafka consumer based on the list of topics
that I asked for.

Thanks,
Juho

On Wed, Nov 22, 2017 at 2:57 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi!
>
> The FlinkKafkaConsumer can handle watermark advancement with
> per-Kafka-partition awareness (across partitions of different topics).
> You can see an example of how to do that here [1].
>
> Basically what this does is that it generates watermarks within the Kafka
> consumer individually for each Kafka partition, and the per-partition
> watermarks are aggregated and emitted from the consumer in the same way
> that
> watermarks are aggregated on a stream shuffle; only when the low watermark
> advances across all partitions, should a watermark be emitted from the
> consumer.
>
> Therefore, this helps avoid the problem that you described, in which a
> "big_topic" has subscribed partitions that lags behind others. In this case
> and when the above feature is used, the event time would advance along with
> the lagging "big_topic" partitions and would not result in messages being
> recognized as late and discarded.
>
> Cheers,
> Gordon
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_
> timestamps_watermarks.html#timestamps-per-kafka-partition
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Reply via email to