a) Yes, there is a general proposal for "update" support to existing
pipelines for all runners[1] which allows for some topology changes.
Dataflow has had support for this for a long time for streaming pipelines
which several customers rely on. Support for dynamic graph expansion during
runtime and loops has been in the idea gathering phase for a long time and
hasn't made too much progress. Others have asked for dynamic graph
expansion and loops in the past.

b) Do you think retractions[2] would address this use case?

1:
https://lists.apache.org/thread.html/3cfbd650a46327afc752a220b20a6081570000725c96541c21265e7b@%3Cdev.beam.apache.org%3E
2: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

On Fri, Aug 17, 2018 at 10:21 AM Raghu Angadi <[email protected]> wrote:

>
> On Thu, Aug 16, 2018 at 2:20 PM Thomas Browne <[email protected]> wrote:
>
>> I just watched the excellent presentation by Markku Leppisto in
>> Singapore. I consult for a financial broker in London.
>> Our use case is streaming financial data on which various analytics are
>> performed to find relative value trading opportunities in the fixed income
>> markets.
>>
>> I have two questions about Apache Beam:
>>
>> a) Does it, or will it, support dynamic topologies? We have many analysts
>> all of whom want slight variations of the topologies, and they may want to
>> change the topologies, or the function ("node") parameters, at runtime. Is
>> this possible, and if not is "second prize" of simply growing the DAG with
>> non-recombinant node branches at runtime, possible?
>>
>
> Topologies are fixed. Some form of "second prize" is probably doable, but
> I think the pipeline author needs to handle it at runtime.
>
> b) Does, or will Apache beam support "replays"? We find quite often that
>> historical inputs change, a long time later (for example, a bond price from
>> a few weeks ago is seen to have been erroneous, and is changed). We then
>> have to re-run from that point forward as all downstream calcs are
>> dependent on all prices. I believe Apache Flink supports this type of
>> functionality? Is this possilble with Beam?
>>
>
> A streaming pipeline can be restarted from earlier resume point as long as
> the sources support it. Other than that, I don't think there is much more
> explicit support (e.g. in-built support to combine updated historical
> aggregates to future updates etc). Flink's save-points should help with
> Beam on Flink as well, it might make it simpler to save multiple snapshots.
>
>
>> I am evaluating the various dataflow programming environments that are
>> springing up and will advise based partly on the above.
>>
>> Many thanks,
>>
>> Thomas
>>
>>

Reply via email to