A naive question about DirectPipelineRunner: Is it possible to
execute DirectPipelineRunner with multiple threads/ instances (across
machines) or the parallelism is only supported by runner such as
SparkPipelineRunner?

My requirement is to run pipeline in parallel, either threading or multiple
machines. And I just start to investigating Apache Beam.

When reading google dataflow doc, the options setting mention that
numWorkers can be configured for the instances to use (I understand it's
still different from Apache Beam). However, searching Apache Beam source on
github with the keyword 'numWorkers' doesn't come up related source
snippet. So I am wondering if the only way to execute pipeline process in
parallel is to use SparkPipelineRunner/ FlinkPipelineRunner (meaning I have
to use Apache Beam + Spark/ Flink) or make use of Google Cloud Platform?

Thanks

[1].
https://cloud.google.com/dataflow/pipelines/specifying-exec-params#setting-other-cloud-pipeline-options

Reply via email to