Thanks everyone for confirming my understanding and suggesting alternatives.
I guess this is the conclusion till this is available(which will take time,
understandably) -

   - Can fix the number of iterations? Use fixed loop(and check nuances of
   the runner's state preservation <http://stackoverflow.com/a/29819883>
   technique across iterations).
   - Can't fix the number of iterations? Use the other approach & an
   external state store(rather than the more desirable  native store) for
   preserving state across iterations.

-Eswar.

On Thu, Jun 23, 2016 at 1:55 AM, Dan Halperin <[email protected]> wrote:

> In addition to Frances' suggestion, you can also put a fixed loop *inside*
> a pipeline by unrolling it, aka by running some iterative step N times.
> This is a common approach in massively data parallel systems to e.g.
> pagerank -- assume that it will converge after 100 iterations, and run your
> pipeline 100 times.
>
> While definitely not as flexible as cycles, this is very useful.
>
> On Wed, Jun 22, 2016 at 1:10 PM, Frances Perry <[email protected]> wrote:
>
>> Correct -- Beam pipelines currently have to be DAGs currently and can't
>> support cycles.
>>
>> There's a jira to look into iterations (
>> https://issues.apache.org/jira/browse/BEAM-106). The semantics get a
>> little tricky given the unified model, so there's still quite a lot to
>> figure out.
>>
>> In the meantime, you can maintain the loop outside your pipeline --
>> essentially creating a new pipeline for each iteration.
>>
>>
>>
>> On Wed, Jun 22, 2016 at 11:36 AM, Eswar Reddy <[email protected]
>> > wrote:
>>
>>> Hi Beam Users,
>>>
>>> For my future projects, I anticipate having to create pipelines having
>>> cycle(s).Could see some engines(Flink
>>> <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#iterations>
>>>  and Storm
>>> <https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8>)
>>> support this, they also provide flexibility to the user to ensure the loop
>>> ends for a given message by splitting & filtering streams. Could someone
>>> share any pointers to analogus of this in Beam Model? I did see that
>>> documentation
>>> <https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/Pipeline#class-pipeline>
>>> says Pipeline is always a DAG though.
>>>
>>> Thanks,
>>> Eswar.
>>>
>>
>>
>

Reply via email to