Hi,

where did you observe the duplicates, within Flink or in Kafka?
Please be aware that the Flink Kafka Producer does not provide exactly-once
consistency. This is not easily possible because Kafka does not support
transactional writes yet.

Flink's exactly-once guarantees are only valid within the Flink DataStream
program and for some sinks such as the RollingFileSink.

Cheers, Fabian

2016-02-09 10:21 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:

> Hi,
> in general it should not be a problem if one parallel instance of a sink
> is responsible for several Kafka partitions. It can become a problem if the
> timestamps in the different partitions differ by a lot and the watermark
> assignment logic is not able to handle this.
>
> How are you assigning the timestamps/watermarks in your job?
>
> Cheers,
> Aljoscha
> > On 08 Feb 2016, at 21:51, shikhar <shik...@schmizz.net> wrote:
> >
> > Stephan explained in that thread that we're picking the min watermark
> when
> > doing operations that join streams from multiple sources. If we have m:n
> > partition-source assignment where m>n, the source is going to end up with
> > the max watermark. Having m<=n ensures that the lowest watermark is used.
> >
> > Re: automatic enforcement, perhaps allowing for more than 1 Kafka
> partition
> > on a source should require opt-in, e.g. allowOversubscription()
> >
> >
> >
> > --
> > View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-partition-alignment-for-event-time-tp4782p4788.html
> > Sent from the Apache Flink User Mailing List archive. mailing list
> archive at Nabble.com.
>
>

Reply via email to