Thank you, It is my understanding that you suspect a skew in the data, and suggest an increase of heap for that single reducer ?
On Mon, May 25, 2015 at 12:45 PM, Rajesh Balamohan < [email protected]> wrote: > > As of today, Tez autoparallelism can only decrease the number of reducers > allocated. It can not increase the number of tasks at runtime (could be > there in future releases). > > - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is > approximately 1.0, you can possibly increase the number of reducers for the > vertex. > - If the ratio of REDUCE_INPUT_GROUPS / REDUCE_INPUT_RECORDS is lot less > than 0.2 (~20%), this could potentially mean single reducer taking up most > of the records. In this case, you might want to consider increasing the > amount of memory allocated (try increasing the container size to check if > it is helping the situation) > > ~Rajesh.B > > On Mon, May 25, 2015 at 2:41 PM, David Ginzburg <[email protected]> > wrote: > >> Thank you, >> Already tried this with no effect on number of reducers >> >> On Mon, May 25, 2015 at 3:51 AM, [email protected] <[email protected]> >> wrote: >> >>> >>> when one reduce process too many data(skew join) set >>> hive.tez.auto.reducer.parallelism >>> =true can slove this problem? >>> >>> ------------------------------ >>> [email protected] >>> >> >> > > > -- > ~Rajesh.B >
