Hi Avihai, The problem is that every message queuing sink only provides at-least-once > guarantee >
>From what I see, possible messaging queue which guarantees exactly-once is Kafka 0.11, while using the Kafka transactional messaging <https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging> feature. See here: https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 Another approach could be to de-dup events from the consuming job side. See here: https://github.com/jgrier/FilteringExample Hope this helps, Rafi On Mon, Jun 18, 2018 at 6:46 PM Avihai Berkovitz < avihai.berkov...@microsoft.com> wrote: > Hello, > > > > We are planning a system that will be comprised of 3 different jobs: > > 1. Getting a stream of events, adding some metadata to the events, and > outputting them to a temporary message queue. > 2. Performing some calculations on the events we got from job 1, as > required for product A. > 3. Performing a different set of calculations of the events from job > 1, for product B. > > > > All 3 jobs will be developed by different teams, so we don’t want to > create one massive job that does everything. > > The problem is that every message queuing sink only provides at-least-once > guarantee. If job 1 crashes and recovers, we will get the same events in > the queue and jobs 2 and 3 will process events twice. This is obviously a > problem, and I guess we are not the first to stumble upon it. > > > > Did anyone else had this issue? It seems to me like a fundamental problem > of passing data between jobs, so hopefully there are known solutions and > best practices. It would be great if you can share any solution. > > > > Thanks, > > Avihai > > >