Hi Beam Community, I follow this doc(https://beam.apache.org/documentation/runners/spark/#running-on-dataproc-cluster-yarn-backed). Try to use dataproc to run my beam pipline.
I ceate a dataproc cluster(1 master 2 worker),I could only see one worker is working(use `docker ps` to check worker node). I also increase to use 4 worker got same result. Is there some paramters that I could use, or I do something wrong? I also try to use dataproc(gce)+flink+beam(python) use option `parallelism` seems not working either. This is my code. ``` import apache_beam as beam import logging from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ '--runner=SparkRunner', '--output_executable_path=job.jar', '--spark_version=3', ]) class SplitWords(beam.DoFn): def __init__(self, delimiter=','): self.delimiter = delimiter def process(self, text): import time time.sleep(10) for word in text.split(self.delimiter): yield word logging.getLogger().setLevel(logging.INFO) data = ['Strawberry,Carrot,Eggplant','Tomato,Potato']*800 with beam.Pipeline(options=options) as pipeline: plants = ( pipeline | 'Gardening plants' >> beam.Create(data) | 'Split words' >> beam.ParDo(SplitWords(',')) | beam.Map(print)) ``` Thanks Yuwen Zhao