For hive, "hive.exec.reducers.bytes.per.reducer" (default should be around 256000000).
~Rajesh.B On Mon, May 25, 2015 at 5:19 PM, David Ginzburg <[email protected]> wrote: > Thank you again ! > > The distribution over the partitions is quite uniform. > > Regarding option #1, how can I increase the number of reducers for the > vertex. ? > > On Mon, May 25, 2015 at 2:11 PM, Rajesh Balamohan < > [email protected]> wrote: > >> >> Forgot to mention another scenario #3 in earlier mail. >> >> 1. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is >> approximately 1.0, you can possibly increase the number of reducers for the >> vertex. >> >> 2. If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less >> than 0.2 (~20%) and if almost all the records are processed by this >> reducer, it could mean data skew. In this case, you might want to consider >> increasing the amount of memory allocated (try increasing the container >> size to check if it is helping the situation) >> >> 3. In some cases, REDUCE_INPUT_GROUPS/REDUCE_INPUT_RECORDS ratio might be >> in between (i.e 0.3 - 0.8). In such cases, if most of the records are >> processed by this reducer, you might want to check the partition logic. >> >> >> To answer your question, yes, based on counters if you find that #2 is >> the case, you might want to increase the memory and try it out. >> >> >> >> On Mon, May 25, 2015 at 3:25 PM, David Ginzburg <[email protected]> >> wrote: >> >>> Thank you, >>> It is my understanding that you suspect a skew in the data, and suggest >>> an increase of heap for that single reducer ? >>> >>> On Mon, May 25, 2015 at 12:45 PM, Rajesh Balamohan < >>> [email protected]> wrote: >>> >>>> >>>> As of today, Tez autoparallelism can only decrease the number of >>>> reducers allocated. It can not increase the number of tasks at runtime >>>> (could be there in future releases). >>>> >>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is >>>> approximately 1.0, you can possibly increase the number of reducers for the >>>> vertex. >>>> - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot >>>> less than 0.2 (~20%), this could potentially mean single reducer taking up >>>> most of the records. In this case, you might want to consider increasing >>>> the amount of memory allocated (try increasing the container size to check >>>> if it is helping the situation) >>>> >>>> ~Rajesh.B >>>> >>>> On Mon, May 25, 2015 at 2:41 PM, David Ginzburg < >>>> [email protected]> wrote: >>>> >>>>> Thank you, >>>>> Already tried this with no effect on number of reducers >>>>> >>>>> On Mon, May 25, 2015 at 3:51 AM, [email protected] < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> when one reduce process too many data(skew join) set >>>>>> hive.tez.auto.reducer.parallelism >>>>>> =true can slove this problem? >>>>>> >>>>>> ------------------------------ >>>>>> [email protected] >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> ~Rajesh.B >>>> >>> >>> >> >> >> -- >> ~Rajesh.B >> > > -- ~Rajesh.B
