Hi David,

if you use the InProcessPipelineRunner (the "new" DirectPipelineRunner), than it can creates several threads.

Regards
JB

On 05/24/2016 04:38 PM, David Olsen wrote:
A naive question about DirectPipelineRunner: Is it possible to
execute DirectPipelineRunner with multiple threads/ instances (across
machines) or the parallelism is only supported by runner such as
SparkPipelineRunner?

My requirement is to run pipeline in parallel, either threading or
multiple machines. And I just start to investigating Apache Beam.

When reading google dataflow doc, the options setting mention that
numWorkers can be configured for the instances to use (I understand it's
still different from Apache Beam). However, searching Apache Beam source
on github with the keyword 'numWorkers' doesn't come up related source
snippet. So I am wondering if the only way to execute pipeline process
in parallel is to use SparkPipelineRunner/ FlinkPipelineRunner (meaning
I have to use Apache Beam + Spark/ Flink) or make use of Google Cloud
Platform?

Thanks

[1].
https://cloud.google.com/dataflow/pipelines/specifying-exec-params#setting-other-cloud-pipeline-options

--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to