Re: getting duplicate messages from duplicate jobs

Dawid Wysakowicz Sat, 26 Jan 2019 08:28:07 -0800

Hi Avi,

AFAIK Flink's Kafka consumer uses low level Kafka APIs and do not
participate in partition assignment protocol from Kafka, but it
discovers all available partitions for given topic and manages offsets
itself, what allows to provide exactly-once guarantees with regards to
Flink's internal state.

Flink's Kafka consumer uses the group.id to derive starting offsets for
partitions it can also commit back offsets to kafka for monitoring
purposes[1]. But as I said it does not participate in partition
assignment within a group, so it might happen that the same partition
will be read by multiple consumers with the same group.id.

I'm adding Gordon as a cc to correct me if I am wrong.

Best,

Dawid

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kafka.html#kafka-consumers-offset-committing-behaviour-configuration

On 23/01/2019 18:02, Avi Levi wrote:
> Hi, 
> This quite confusing. 
> I submitted the same stateless job twice (actually I upload it once).
> However when I place a message on kafka, it seems that both jobs
> consumes it, and publish the same result (we publish the result to
> other kafka topic, so I actually see the massage duplicated on kafka
> ). how can it be ? both jobs are using the same group id (group id is
> fixed and not generated )
>
> Kind regards
> Avi

signature.asc
Description: OpenPGP digital signature

Re: getting duplicate messages from duplicate jobs

Reply via email to