How the work is parallelized is dependent on the runner but you should strive to keep your DoFns as simple as possible and try to avoid additional complexity. Generally a runner will parallelize work automatically splitting the source the data is coming from so that each worker is busy, for example they may use signals such as CPU utilization to either increase the number of bundles of work that are processed in parallel per worker. This allows people to write DoFns that are as simple as possible and not have to worry about how to make them run faster.
But there are practical considerations since work can only be split down to how many elements there are and also remote calls may be much more efficient if done in batches but this is a case by case basis and different strategies work better for different problems. Also, as a clarifying point, is it that each generateAndStoreGraphData() , generateAndStoreTimeSerieseData1() , generateAndStoreTimeSerieseData1() need to be called for the same T in parallel (because of some internal requirements) or is it good enough that different T's go through the generateAndStore... in parallel? On Thu, May 5, 2016 at 11:09 AM, kaniska Mandal <[email protected]> wrote: > Inside a CompositeTransformation -- > > is it Ok to spawn threads / use CountDownLatch to perform multiple > ParDo on the same data item at the same item > > or all the calls to individual ParDo inherently parallelized > > For example, I need to execute generateAndStoreGraphData() , > enerateAndStoreTimeSerieseData1() , generateAndStoreTimeSerieseData1()* > -- in parallel .* > > static class MultiOps extends PTransform<PCollection<T>, PCollection<T>> { > > public PCollection<T> apply(PCollection<T> dataSet) { > > dataSet.apply(ParDo.of( generateAndStoreGraphData()); > > dataSet.apply(ParDo.of( generateAndStoreTimeSerieseData1()); > dataSet.apply(ParDo.of( generateAndStoreTimeSerieseData1()); > > return results; > > } > > } > > ===================== > > Thanks > > Kaniska >
