This has been a long requested feature within Apache Beam:
https://issues.apache.org/jira/browse/BEAM-106

The short story is that this requires a lot of support from execution
engines since watermarks become a concept of time + loop iteration
(possibly multiple loop iterations if they are nested).

Some solutions right now:
* for a batch pipeline, unroll your own loops a number of times and place
filters within each loop iteration to prevent downstream processing from
occurring if there is nothing to do. Better yet would be to have your
pipeline materialize its intermediate state after some number of loop
iterations and then have the driver program relaunch the pipeline consuming
the intermediate state if needed.
* for streaming pipeline, create a feedback loop via a source/sink pair
like Kafka/Pubsub where the intermediate computation is output into the
sink and then read by the source.


On Mon, May 28, 2018 at 11:26 PM Jan Callewaert <[email protected]>
wrote:

> Hello,
>
> I am investigating technologies for bulk processing. One of my required
> use cases is large-scale graph processing. This is supported by Apache
> Spark as GraphX, or by Apache Flink as iterative algorithms. However, this
> does not seem to be supported by Apache Beam. Are there any future plans
> for this, or is there an alternative approach for graph processing offered
> by Apache Beam?
>
> Regards,
>
> Jan
>
>

Reply via email to