Hi Chesnay, hi Robert

Thank you for your explanations : - )
(And sorry for the late reply).

Regards,
Robert

Von: Robert Metzger [mailto:rmetz...@apache.org]
Gesendet: Dienstag, 21. Juni 2016 12:12
An: user@flink.apache.org
Betreff: Re: Getting the NumberOfParallelSubtask

Hi Robert,

the number of parallel subtasks is the parallelism of the job or the individual 
operator. Only when executing Flink locally, the parallelism is set to the CPU 
cores.
The number of groups generated by the groupBy() transformation doesn't affect 
the parallelism. Very often the number of groups is much higher than the 
parallelism, in those cases, each parallel instance will process multiple 
groups.

If you want to know the parallelism of your operators globally, you'll need to 
set it manually (say all operators to a parallelism of 8).

Regards,
Robert


On Mon, Jun 20, 2016 at 10:00 PM, Chesnay Schepler 
<ches...@apache.org<mailto:ches...@apache.org>> wrote:
Within the mapper you cannot access the parallelism of the following nor 
preceding operation.


On 20.06.2016 15:56, Paschek, Robert wrote:
Hi Mailing list,

using a RichMapPartitionFunction i can access the total number m of this mapper 
utilized in my job with
int m = getRuntimeContext().getNumberOfParallelSubtasks();

I think that would be - in general - the total number of CPU Cores used by 
Apache Flink among the cluster.

Is there a way to access the number of the following reducer?

In general i would assume that the number of the following reducers depends on 
the number of groups generated by the groupBy() transformation. So the number 
of the reducer r would be 1 <= r <= m.

My Job:
DataSet<?> output = input
                                .mapPartition(new MR_GPMRS_Mapper())
                                .groupBy(0)
                                .reduceGroup(new MR_GPMRS_Reducer());

Thank you in advance
Robert


Reply via email to