Hi Jan,

This is a valuable but tricky problem. As you notice, it is an issue with
watermarks. If you have a cycle, it may be convergent or not convergent
(aka an infinite loop). Since a watermark measures completeness, it should
advance only up to some event time where the cycle has converged. If your
back edge topic has a watermark that advances - perhaps because you only
write future-timestamped data to it - then your pipeline will make
progress. Otherwise, there needs to be a separate "nested" watermark (or
other notion of convergence) for the cycle, which does not exist in Beam.
The most relevant work I know of is Naiad, which supports a nested logical
clock for this very purpose.

Kenn

On Wed, Jan 2, 2019 at 6:28 AM <[email protected]> wrote:

> Hi all,
>
> In the documentation it says that a pipeline forms a DAG.
> For my project however I need to introduce a loop (feed back some results
> to an earlier phase of the pipeline).
> For now I have solved this using a separate kafka topic, from which I read
> somewhere in the beginning of the pipeline, and only write to .
>
> Using an extra kafka topic of course breaks the flow of eventtime and
> watermarks. Also I'm achieving at-least-once behavior iso exactly-once.
> For my application these behaviors are acceptable (although it could
> perhaps come in handy if I would keep control over the flow of watermarks,
> but it's not absolutely needed).
>
> Is there a better (more efficient) way to achieve this?
>
> I was thinking of creating a 'pipe'.
>
> interface Pipe<T> {
>   PTransform<PBegin, PCollection<T>> getSource();
>   PTransform<PCollection<T>, PDone> getSink();
> }
>
> Pipe<T> createPipe(String name);
>
> the source and sink should be internally connected and participate
> correctly in the snapshotting flow. (not sure yet about the required
> details for this so that it is correct, any help/pointers would be
> appreciated!)
>
> However, before I embark on this effort I would like to hear your thoughts
> about it.
>
> Thanks, kind regards,
> Jan
>
>

Reply via email to