Hi Nick,
Short (and somewhat superficial answer):
* (assuming your producer supports exactly once mode (e.g. Kafka))
* Duplicates should only ever appear when your job restarts after a hiccup
* However if you job is properly configured (checkpointing/Kafka
transactions) everything should be fine, provided
* The consumer of your kafka topic is in read-committed mode,
* In that case you should only see events produced every checkpoint cycle
* If the consumer of your produced topic is in read-uncommitted mode it
will indeed see duplicates and needs to implement deduplication/idempotence
manually
Hope this helps clarify the matter
Sincere greetings
Thias
From: nick toker <[email protected]>
Sent: Donnerstag, 7. September 2023 13:36
To: user <[email protected]>
Subject: kafka duplicate messages
Hi
i am configured with exactly ones
i see that flink producer send duplicate messages ( sometime few copies)
that consumed latter only ones by other application,
How can I avoid duplications ?
regards'
nick
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng
verboten.
This message is intended only for the named recipient and may contain
confidential or privileged information. As the confidentiality of email
communication cannot be guaranteed, we do not accept any responsibility for the
confidentiality and the intactness of this message. If you have received it in
error, please advise the sender by return e-mail and delete this message and
any attachments. Any unauthorised use or dissemination of this information is
strictly prohibited.