I want to implement a job which feeds sink messages back into a source,
effectively introducing a cycle. For simplicity, consider the following
kafka-based enrichment scenario:

- A source for the "facts"
- A source for the "dimensions"
- A sink for the "dimensions" (this creates a loop)
- A co-process operator for joining doing enrichment
- Finally, another sink for the final "enriched facts"

The thing is that, based on the "acts, the data in the dimensions table
might change, and in that case I want to update both the internal state of
the join operator (where the table is materialized) and emit the update to
the sink (because another service needs that information which in turn will
potentially result in new facts coming in).

So, to be clear, the feedback loop is not an internal one within the Flink
job pipeline, but an external one and hence the graph topology continues to
be a DAG.

I was just wondering how common this use case is and which precautions are
necessary to take if any. In other libraries/frameworks, that might be
problematic, e.g.:

- https://github.com/lovoo/goka/issues/95

I guess for keeping the local state of the joiner in sync with the i/o
kafka topic I would need to enable exactly once guarantees at the sink
level, e.g., by following this recipe:

-
https://docs.immerok.cloud/docs/how-to-guides/development/exactly-once-with-apache-kafka-and-apache-flink/

Thanks in advance!

Salva

Reply via email to