Pipeline design including (sub / nested)-pipelines

Pascal Gula Thu, 16 Aug 2018 06:13:09 -0700

Hello,
I am currently evaluating Apache Beam (later executing on Google DataFlow),
and for the first use-case I am working on, I have a kinda design question
to see if any of you already had a similar one.
Namely, we have a DB describing dashboards views, and for each views, we
would like to perform some aggregation transform.
My first approach would be to create a higher level pipeline that will
fetch all view configurations from our mongoDB (BTW, we released a mongoDB
IO connector here: https://pypi.org/project/beam-extended/). With this
views PColl, the idea is to have a ParDo, with a DoFn that will create
sub-pipleine to perform the aggregation on data from our plant database
with a qurey derived from the view configuration. Afterwards, the idea is
to save for the higher level pipeline, some performance/data metrics
related to the execution of the array of sub-pipeline.
The main question is: are nested pipeline supported by the runner?
I hope that my description was clear enough. I will work on a diagram view
meanwhile.
Very best regards,
Pascal


-- 

Pascal Gula
Senior Data Engineer / Scientist
+49 (0)176 34232684www.plantix.net <http://plantix.net/>
 PEAT GmbH
Kastanienallee 4
10435 Berlin // Germany
 <https://play.google.com/store/apps/details?id=com.peat.GartenBank>Download
the App! <https://play.google.com/store/apps/details?id=com.peat.GartenBank>

Pipeline design including (sub / nested)-pipelines

Reply via email to