Contributions in this space are welcome. Reach out to [email protected] once your ready.
On Fri, Aug 17, 2018 at 3:02 AM Pascal Gula <[email protected]> wrote: > Hi Lukasz, > thanks for the proposed solution. This was also one of the alternative > implementations that I thought of. > When you are talking about launching a job from another job, I understand > doing a system call from another python job and getting result by some > means (reading synchronously the output of child jobs), am I correct? > I'll first test this with the DirectRunner calling other DirectRunner(s), > and afterwards doing it on GCP with DataFlow. > Regarding nesting pipeline, I can provide support to build a demonstrator > if I can have some support from the community. > Thanks again and very best regards, > Pascal > > > On Thu, Aug 16, 2018 at 8:43 PM, Lukasz Cwik <[email protected]> wrote: > >> You can launch another Dataflow job from within an existing Dataflow job. >> For all intensive purposes, Dataflow won't know that the jobs are related >> in any way so they will only be "nested" because your outer pipeline knows >> about the inner pipeline. >> >> You should be able to do this for all runners (granted you need to >> propagate all runner/pipeline configuration through) and you should be able >> to take a job from one runner and launch a job on a different runner >> (you'll have to deal with the complexities of having two runners and their >> dependencies somehow though). >> >> There was some work investigating supporting nested graphs within Apache >> Beam and to support dynamic graph expansion during execution as a general >> concept. This was to support use cases such as recursion and loops but this >> didn't progress much more then the idea generation phase. >> >> On Thu, Aug 16, 2018 at 9:47 AM Pascal Gula <[email protected]> wrote: >> >>> Hi Robin, >>> this is unfortunate news, but I already anticipated such answer with an >>> alternative implementation. >>> It would be however interesting to support such feature since I am >>> probably not the first person asking for this. >>> Best regards, >>> Pascal >>> >>> On Thu, Aug 16, 2018 at 6:20 PM, Robin Qiu <[email protected]> wrote: >>> >>>> Hi Pascal, >>>> >>>> As far as I know, you can't create sub-pipeline within a DoFn, i.e. >>>> nested pipelines are not supported. >>>> >>>> Best, >>>> Robin >>>> >>>> On Thu, Aug 16, 2018 at 7:03 AM Pascal Gula <[email protected]> wrote: >>>> >>>>> As a bonus, here is a simplified diagram view of the use-case: >>>>> >>>>> Cheers, >>>>> Pascal >>>>> >>>>> >>>>> On Thu, Aug 16, 2018 at 3:12 PM, Pascal Gula <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> I am currently evaluating Apache Beam (later executing on Google >>>>>> DataFlow), and for the first use-case I am working on, I have a kinda >>>>>> design question to see if any of you already had a similar one. >>>>>> Namely, we have a DB describing dashboards views, and for each views, >>>>>> we would like to perform some aggregation transform. >>>>>> My first approach would be to create a higher level pipeline that >>>>>> will fetch all view configurations from our mongoDB (BTW, we released a >>>>>> mongoDB IO connector here: https://pypi.org/project/beam-extended/). >>>>>> With this views PColl, the idea is to have a ParDo, with a DoFn that will >>>>>> create sub-pipleine to perform the aggregation on data from our plant >>>>>> database with a qurey derived from the view configuration. Afterwards, >>>>>> the >>>>>> idea is to save for the higher level pipeline, some performance/data >>>>>> metrics related to the execution of the array of sub-pipeline. >>>>>> The main question is: are nested pipeline supported by the runner? >>>>>> I hope that my description was clear enough. I will work on a diagram >>>>>> view meanwhile. >>>>>> Very best regards, >>>>>> Pascal >>>>>> >>>>>> -- >>>>>> >>>>>> Pascal Gula >>>>>> Senior Data Engineer / Scientist >>>>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/> >>>>>> PEAT GmbH >>>>>> Kastanienallee 4 >>>>>> 10435 Berlin // Germany >>>>>> >>>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download >>>>>> the App! >>>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Pascal Gula >>>>> Senior Data Engineer / Scientist >>>>> +49 (0)176 34232684www.plantix.net <http://plantix.net/> >>>>> PEAT GmbH >>>>> Kastanienallee 4 >>>>> 10435 Berlin // Germany >>>>> >>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download >>>>> the App! >>>>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank> >>>>> >>>>> >>> >>> >>> -- >>> >>> Pascal Gula >>> Senior Data Engineer / Scientist >>> +49 (0)176 34232684www.plantix.net <http://plantix.net/> >>> PEAT GmbH >>> Kastanienallee 4 >>> 10435 Berlin // Germany >>> >>> <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download >>> the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank> >>> >>> > > > -- > > Pascal Gula > Senior Data Engineer / Scientist > +49 (0)176 34232684www.plantix.net <http://plantix.net/> > PEAT GmbH > Kastanienallee 4 > 10435 Berlin // Germany > <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download > the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank> > >
