a) Yes, there is a general proposal for "update" support to existing pipelines for all runners[1] which allows for some topology changes. Dataflow has had support for this for a long time for streaming pipelines which several customers rely on. Support for dynamic graph expansion during runtime and loops has been in the idea gathering phase for a long time and hasn't made too much progress. Others have asked for dynamic graph expansion and loops in the past.
b) Do you think retractions[2] would address this use case? 1: https://lists.apache.org/thread.html/3cfbd650a46327afc752a220b20a6081570000725c96541c21265e7b@%3Cdev.beam.apache.org%3E 2: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 On Fri, Aug 17, 2018 at 10:21 AM Raghu Angadi <[email protected]> wrote: > > On Thu, Aug 16, 2018 at 2:20 PM Thomas Browne <[email protected]> wrote: > >> I just watched the excellent presentation by Markku Leppisto in >> Singapore. I consult for a financial broker in London. >> Our use case is streaming financial data on which various analytics are >> performed to find relative value trading opportunities in the fixed income >> markets. >> >> I have two questions about Apache Beam: >> >> a) Does it, or will it, support dynamic topologies? We have many analysts >> all of whom want slight variations of the topologies, and they may want to >> change the topologies, or the function ("node") parameters, at runtime. Is >> this possible, and if not is "second prize" of simply growing the DAG with >> non-recombinant node branches at runtime, possible? >> > > Topologies are fixed. Some form of "second prize" is probably doable, but > I think the pipeline author needs to handle it at runtime. > > b) Does, or will Apache beam support "replays"? We find quite often that >> historical inputs change, a long time later (for example, a bond price from >> a few weeks ago is seen to have been erroneous, and is changed). We then >> have to re-run from that point forward as all downstream calcs are >> dependent on all prices. I believe Apache Flink supports this type of >> functionality? Is this possilble with Beam? >> > > A streaming pipeline can be restarted from earlier resume point as long as > the sources support it. Other than that, I don't think there is much more > explicit support (e.g. in-built support to combine updated historical > aggregates to future updates etc). Flink's save-points should help with > Beam on Flink as well, it might make it simpler to save multiple snapshots. > > >> I am evaluating the various dataflow programming environments that are >> springing up and will advise based partly on the above. >> >> Many thanks, >> >> Thomas >> >>
