Forgot to mention another scenario #3 in earlier mail.

1. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
approximately 1.0, you can possibly increase the number of reducers for the
vertex.

2. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
than 0.2 (~20%) and if almost all the records are processed by this
reducer, it could mean data skew.  In this case, you might want to consider
increasing the amount of memory allocated (try increasing the container
size to check if it is helping the situation)

3. In some cases, REDUCE_INPUT_GROUPS/REDUCE_INPUT_RECORDS ratio might be
in between (i.e 0.3 - 0.8). In such cases, if most of the records are
processed by this reducer, you might want to check the partition logic.


To answer your question, yes, based on counters if you find that #2 is the
case, you might want to increase the memory and try it out.



On Mon, May 25, 2015 at 3:25 PM, David Ginzburg <[email protected]>
wrote:

> Thank you,
> It is my understanding that you suspect a skew in the data, and suggest an
> increase of heap for that single reducer ?
>
> On Mon, May 25, 2015 at 12:45 PM, Rajesh Balamohan <
> [email protected]> wrote:
>
>>
>> As of today, Tez autoparallelism can only decrease the number of reducers
>> allocated. It can not increase the number of tasks at runtime (could be
>> there in future releases).
>>
>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
>> approximately 1.0, you can possibly increase the number of reducers for the
>> vertex.
>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
>> than 0.2 (~20%), this could potentially mean single reducer taking up most
>> of the records.  In this case, you might want to consider increasing the
>> amount of memory allocated (try increasing the container size to check if
>> it is helping the situation)
>>
>> ~Rajesh.B
>>
>> On Mon, May 25, 2015 at 2:41 PM, David Ginzburg <[email protected]>
>> wrote:
>>
>>> Thank you,
>>> Already tried this with no effect on number of reducers
>>>
>>> On Mon, May 25, 2015 at 3:51 AM, [email protected] <[email protected]>
>>> wrote:
>>>
>>>>
>>>> when one reduce process too many data(skew join)  set 
>>>> hive.tez.auto.reducer.parallelism
>>>> =true can slove this problem?
>>>>
>>>> ------------------------------
>>>> [email protected]
>>>>
>>>
>>>
>>
>>
>> --
>> ~Rajesh.B
>>
>
>


-- 
~Rajesh.B

Reply via email to