Re: Keys distribution insights

Flavio Pompermaier Tue, 06 Jun 2017 07:00:57 -0700

Thanks Aljoscha. As I was suspecting, currently there's no unobtrusive way
for that, but I can live with it..


Best,
Flavio

On Tue, Jun 6, 2017 at 3:48 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
>
> There is no way of doing it with any Flink UI but you could try and do it
> manually: in your job, instead of doing the actual computation just count
> how many elements you have per key (in your GroupReduce). Then put a
> MapPartition right after the GroupReduce (which should preserve the same
> partitioning) and inside that see what keys you have and how many elements
> you had per key. With this you know which partition, i.e. which parallel
> instance had which keys and how many they were.
>
> Best,
> Aljoscha
>
> > On 5. Jun 2017, at 12:01, Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
> >
> > Hi everybody,
> > in my job I have a groupReduce operator with parallelism 4 and one of
> the sub-tasks takes a huge amount of time (wrt the others).
> > My guess is that the objects assigned to that slot have much more data
> to reduce (an thus are somehow computationally heavy within the groupReduce
> operator).
> > What I'm trying to understand which keys are assigned to that slot: is
> there any way (from the JobManager UI or from the logs) to investigate the
> keys distribution (that from the plan visualization is the result of an
> hash partition)?
> >
> > Best,
> > Flavio
>

Re: Keys distribution insights

Reply via email to