Hello, We are planning a system that will be comprised of 3 different jobs:
1. Getting a stream of events, adding some metadata to the events, and outputting them to a temporary message queue. 2. Performing some calculations on the events we got from job 1, as required for product A. 3. Performing a different set of calculations of the events from job 1, for product B. All 3 jobs will be developed by different teams, so we don't want to create one massive job that does everything. The problem is that every message queuing sink only provides at-least-once guarantee. If job 1 crashes and recovers, we will get the same events in the queue and jobs 2 and 3 will process events twice. This is obviously a problem, and I guess we are not the first to stumble upon it. Did anyone else had this issue? It seems to me like a fundamental problem of passing data between jobs, so hopefully there are known solutions and best practices. It would be great if you can share any solution. Thanks, Avihai