Keys distribution insights

2017-06-05 Thread Flavio Pompermaier
Hi everybody, in my job I have a groupReduce operator with parallelism 4 and one of the sub-tasks takes a huge amount of time (wrt the others). My guess is that the objects assigned to that slot have much more data to reduce (an thus are somehow computationally heavy within the groupReduce operator

Re: Keys distribution insights

2017-06-06 Thread Aljoscha Krettek
Hi, There is no way of doing it with any Flink UI but you could try and do it manually: in your job, instead of doing the actual computation just count how many elements you have per key (in your GroupReduce). Then put a MapPartition right after the GroupReduce (which should preserve the same p

Re: Keys distribution insights

2017-06-06 Thread Flavio Pompermaier
Thanks Aljoscha. As I was suspecting, currently there's no unobtrusive way for that, but I can live with it.. Best, Flavio On Tue, Jun 6, 2017 at 3:48 PM, Aljoscha Krettek wrote: > Hi, > > There is no way of doing it with any Flink UI but you could try and do it > manually: in your job, instead