Re: hive.tez.auto.reducer.parallelism can slove skew join problem?

David Ginzburg Mon, 25 May 2015 04:51:13 -0700

Thank you again !

The distribution over the partitions is quite uniform.


Regarding option #1, how can I increase the number of reducers for the
vertex. ?

On Mon, May 25, 2015 at 2:11 PM, Rajesh Balamohan <
[email protected]> wrote:

>
> Forgot to mention another scenario #3 in earlier mail.
>
> 1. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
> approximately 1.0, you can possibly increase the number of reducers for the
> vertex.
>
> 2. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
> than 0.2 (~20%) and if almost all the records are processed by this
> reducer, it could mean data skew.  In this case, you might want to consider
> increasing the amount of memory allocated (try increasing the container
> size to check if it is helping the situation)
>
> 3. In some cases, REDUCE_INPUT_GROUPS/REDUCE_INPUT_RECORDS ratio might be
> in between (i.e 0.3 - 0.8). In such cases, if most of the records are
> processed by this reducer, you might want to check the partition logic.
>
>
> To answer your question, yes, based on counters if you find that #2 is the
> case, you might want to increase the memory and try it out.
>
>
>
> On Mon, May 25, 2015 at 3:25 PM, David Ginzburg <[email protected]>
> wrote:
>
>> Thank you,
>> It is my understanding that you suspect a skew in the data, and suggest
>> an increase of heap for that single reducer ?
>>
>> On Mon, May 25, 2015 at 12:45 PM, Rajesh Balamohan <
>> [email protected]> wrote:
>>
>>>
>>> As of today, Tez autoparallelism can only decrease the number of
>>> reducers allocated. It can not increase the number of tasks at runtime
>>> (could be there in future releases).
>>>
>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
>>> approximately 1.0, you can possibly increase the number of reducers for the
>>> vertex.
>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
>>> than 0.2 (~20%), this could potentially mean single reducer taking up most
>>> of the records.  In this case, you might want to consider increasing the
>>> amount of memory allocated (try increasing the container size to check if
>>> it is helping the situation)
>>>
>>> ~Rajesh.B
>>>
>>> On Mon, May 25, 2015 at 2:41 PM, David Ginzburg <[email protected]
>>> > wrote:
>>>
>>>> Thank you,
>>>> Already tried this with no effect on number of reducers
>>>>
>>>> On Mon, May 25, 2015 at 3:51 AM, [email protected] <[email protected]
>>>> > wrote:
>>>>
>>>>>
>>>>> when one reduce process too many data(skew join)  set 
>>>>> hive.tez.auto.reducer.parallelism
>>>>> =true can slove this problem?
>>>>>
>>>>> ------------------------------
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ~Rajesh.B
>>>
>>
>>
>
>
> --
> ~Rajesh.B
>

Re: hive.tez.auto.reducer.parallelism can slove skew join problem?

Reply via email to