Probably the easiest thing to do here would be to implement your own runner that wraps another runner, and overrides https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/runner.py#L154 to do whatever inspection you want of the graph before passing it on. (You would also want to delegates calls to apply until that method goes away.)
Note that the pipeline object itself isn't very amenable to introspection (and is an internal detail that has, and likely will, changed a lot). However, the portable representation (Pipeline.to_runner_api) should be fine to build on (and can be converted back into a pipeline via Pipeline.from_runner_api). On Mon, Jun 24, 2019 at 10:34 AM Germain TANGUY <[email protected]> wrote: > > Hello, > > > > I would like to access the graph object of my Apache Beam pipeline to > traverse it myself and also print it locally before running. I thought I > could find the starting point of my DAG from the pipeline or runner > instantiation > (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L20). > > > > Do you know if there is an attribute or method I could call from the pipeline > to get the graph ? > > > > I am using “apache-beam[gcp]==2.11.0” with a DirectRunner and DataflowRunner. > > > Regards, > > > > Germain T. > >
