For hive, "hive.exec.reducers.bytes.per.reducer" (default should be around
256000000).

~Rajesh.B

On Mon, May 25, 2015 at 5:19 PM, David Ginzburg <[email protected]>
wrote:

> Thank you again !
>
> The distribution over the partitions is quite uniform.
>
> Regarding option #1, how can I increase the number of reducers for the
> vertex. ?
>
> On Mon, May 25, 2015 at 2:11 PM, Rajesh Balamohan <
> [email protected]> wrote:
>
>>
>> Forgot to mention another scenario #3 in earlier mail.
>>
>> 1. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
>> approximately 1.0, you can possibly increase the number of reducers for the
>> vertex.
>>
>> 2. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less
>> than 0.2 (~20%) and if almost all the records are processed by this
>> reducer, it could mean data skew.  In this case, you might want to consider
>> increasing the amount of memory allocated (try increasing the container
>> size to check if it is helping the situation)
>>
>> 3. In some cases, REDUCE_INPUT_GROUPS/REDUCE_INPUT_RECORDS ratio might be
>> in between (i.e 0.3 - 0.8). In such cases, if most of the records are
>> processed by this reducer, you might want to check the partition logic.
>>
>>
>> To answer your question, yes, based on counters if you find that #2 is
>> the case, you might want to increase the memory and try it out.
>>
>>
>>
>> On Mon, May 25, 2015 at 3:25 PM, David Ginzburg <[email protected]>
>> wrote:
>>
>>> Thank you,
>>> It is my understanding that you suspect a skew in the data, and suggest
>>> an increase of heap for that single reducer ?
>>>
>>> On Mon, May 25, 2015 at 12:45 PM, Rajesh Balamohan <
>>> [email protected]> wrote:
>>>
>>>>
>>>> As of today, Tez autoparallelism can only decrease the number of
>>>> reducers allocated. It can not increase the number of tasks at runtime
>>>> (could be there in future releases).
>>>>
>>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is
>>>> approximately 1.0, you can possibly increase the number of reducers for the
>>>> vertex.
>>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot
>>>> less than 0.2 (~20%), this could potentially mean single reducer taking up
>>>> most of the records.  In this case, you might want to consider increasing
>>>> the amount of memory allocated (try increasing the container size to check
>>>> if it is helping the situation)
>>>>
>>>> ~Rajesh.B
>>>>
>>>> On Mon, May 25, 2015 at 2:41 PM, David Ginzburg <
>>>> [email protected]> wrote:
>>>>
>>>>> Thank you,
>>>>> Already tried this with no effect on number of reducers
>>>>>
>>>>> On Mon, May 25, 2015 at 3:51 AM, [email protected] <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>
>>>>>> when one reduce process too many data(skew join)  set 
>>>>>> hive.tez.auto.reducer.parallelism
>>>>>> =true can slove this problem?
>>>>>>
>>>>>> ------------------------------
>>>>>> [email protected]
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~Rajesh.B
>>>>
>>>
>>>
>>
>>
>> --
>> ~Rajesh.B
>>
>
>


-- 
~Rajesh.B

Reply via email to