Hi, where did you observe the duplicates, within Flink or in Kafka? Please be aware that the Flink Kafka Producer does not provide exactly-once consistency. This is not easily possible because Kafka does not support transactional writes yet.
Flink's exactly-once guarantees are only valid within the Flink DataStream program and for some sinks such as the RollingFileSink. Cheers, Fabian 2016-02-09 10:21 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: > Hi, > in general it should not be a problem if one parallel instance of a sink > is responsible for several Kafka partitions. It can become a problem if the > timestamps in the different partitions differ by a lot and the watermark > assignment logic is not able to handle this. > > How are you assigning the timestamps/watermarks in your job? > > Cheers, > Aljoscha > > On 08 Feb 2016, at 21:51, shikhar <shik...@schmizz.net> wrote: > > > > Stephan explained in that thread that we're picking the min watermark > when > > doing operations that join streams from multiple sources. If we have m:n > > partition-source assignment where m>n, the source is going to end up with > > the max watermark. Having m<=n ensures that the lowest watermark is used. > > > > Re: automatic enforcement, perhaps allowing for more than 1 Kafka > partition > > on a source should require opt-in, e.g. allowOversubscription() > > > > > > > > -- > > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4788.html > > Sent from the Apache Flink User Mailing List archive. mailing list > archive at Nabble.com. > >