Thanks everyone for confirming my understanding and suggesting alternatives. I guess this is the conclusion till this is available(which will take time, understandably) -
- Can fix the number of iterations? Use fixed loop(and check nuances of the runner's state preservation <http://stackoverflow.com/a/29819883> technique across iterations). - Can't fix the number of iterations? Use the other approach & an external state store(rather than the more desirable native store) for preserving state across iterations. -Eswar. On Thu, Jun 23, 2016 at 1:55 AM, Dan Halperin <[email protected]> wrote: > In addition to Frances' suggestion, you can also put a fixed loop *inside* > a pipeline by unrolling it, aka by running some iterative step N times. > This is a common approach in massively data parallel systems to e.g. > pagerank -- assume that it will converge after 100 iterations, and run your > pipeline 100 times. > > While definitely not as flexible as cycles, this is very useful. > > On Wed, Jun 22, 2016 at 1:10 PM, Frances Perry <[email protected]> wrote: > >> Correct -- Beam pipelines currently have to be DAGs currently and can't >> support cycles. >> >> There's a jira to look into iterations ( >> https://issues.apache.org/jira/browse/BEAM-106). The semantics get a >> little tricky given the unified model, so there's still quite a lot to >> figure out. >> >> In the meantime, you can maintain the loop outside your pipeline -- >> essentially creating a new pipeline for each iteration. >> >> >> >> On Wed, Jun 22, 2016 at 11:36 AM, Eswar Reddy <[email protected] >> > wrote: >> >>> Hi Beam Users, >>> >>> For my future projects, I anticipate having to create pipelines having >>> cycle(s).Could see some engines(Flink >>> <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#iterations> >>> and Storm >>> <https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8>) >>> support this, they also provide flexibility to the user to ensure the loop >>> ends for a given message by splitting & filtering streams. Could someone >>> share any pointers to analogus of this in Beam Model? I did see that >>> documentation >>> <https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/Pipeline#class-pipeline> >>> says Pipeline is always a DAG though. >>> >>> Thanks, >>> Eswar. >>> >> >> >
