cyclic (non-DAG) computations

jan . doms Wed, 02 Jan 2019 06:28:35 -0800

Hi all,

In the documentation it says that a pipeline forms a DAG.
For my project however I need to introduce a loop (feed back some results
to an earlier phase of the pipeline).
For now I have solved this using a separate kafka topic, from which I read
somewhere in the beginning of the pipeline, and only write to .


Using an extra kafka topic of course breaks the flow of eventtime and
watermarks. Also I'm achieving at-least-once behavior iso exactly-once.
For my application these behaviors are acceptable (although it could
perhaps come in handy if I would keep control over the flow of watermarks,
but it's not absolutely needed).

Is there a better (more efficient) way to achieve this?

I was thinking of creating a 'pipe'.

interface Pipe<T> {
  PTransform<PBegin, PCollection<T>> getSource();
  PTransform<PCollection<T>, PDone> getSink();
}

Pipe<T> createPipe(String name);

the source and sink should be internally connected and participate
correctly in the snapshotting flow. (not sure yet about the required
details for this so that it is correct, any help/pointers would be
appreciated!)

However, before I embark on this effort I would like to hear your thoughts
about it.

Thanks, kind regards,
Jan

cyclic (non-DAG) computations

Reply via email to