Probably the easiest thing to do here would be to implement your own
runner that wraps another runner, and overrides
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/runner.py#L154
to do whatever inspection you want of the graph before passing it on.
(You would also want to delegates calls to apply until that method
goes away.)

Note that the pipeline object itself isn't very amenable to
introspection (and is an internal detail that has, and likely will,
changed a lot). However, the portable representation
(Pipeline.to_runner_api) should be fine to build on (and can be
converted back into a pipeline via Pipeline.from_runner_api).

On Mon, Jun 24, 2019 at 10:34 AM Germain TANGUY
<[email protected]> wrote:
>
> Hello,
>
>
>
> I would like to access the graph object of my Apache Beam pipeline to 
> traverse it myself and also print it locally before running. I thought I 
> could find the starting point of my DAG from the pipeline or runner 
> instantiation 
> (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L20).
>
>
>
> Do you know if there is an attribute or method I could call from the pipeline 
> to get the graph ?
>
>
>
> I am using “apache-beam[gcp]==2.11.0” with a DirectRunner and DataflowRunner.
>
>
> Regards,
>
>
>
> Germain T.
>
>

Reply via email to