hi, suppose i have 5-worknode cluster,each worknode can allocate 40G mem ,and i do not care map task,be cause the map task in my job finished within half a minuter,as my observe the real slow task is reduce, i allocate 12G to each reduce task,so each worknode can support 3 reduce parallel,and the whole cluster can support 15 reducer,and i run the job with all 15 reducer, and i do not know if i increase reducer number from 15 to 30 ,each reduce allocate 6G MEM,that will speed the job or not ,the job run on my product env, it run nearly 1 week,it still not finished
On Wed, Dec 11, 2013 at 9:50 PM, java8964 <java8...@hotmail.com> wrote: > The whole job complete time depends on a lot of factors. Are you sure > the reducers part is the bottleneck? > > Also, it also depends on how many Reducer input groups it has in your MR > job. If you only have 20 reducer groups, even you jump your reducer count > to 40, then the epoch of reducers part won't have too much change, as the > additional 20 reducer task won't get data to process. > > If you have a lot of reducer input groups, and your cluster does have > capacity at this time, and your also have a lot idle reducer slot, then > increase your reducer count should decrease your whole job complete time. > > Make sense? > > Yong > > ------------------------------ > Date: Wed, 11 Dec 2013 14:20:24 +0800 > Subject: Re: issue about Shuffled Maps in MR job summary > From: justlo...@gmail.com > To: user@hadoop.apache.org > > > i read the doc, and find if i have 8 reducer ,a map task will output 8 > partition ,each partition will be send to a different reducer, so if i > increase reduce number ,the partition number increase ,but the volume on > network traffic is same,why sometime ,increase reducer number will not > decrease job complete time ? > > On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B > <vinayakuma...@huawei.com>wrote: > > It looks simple, J > > > > Shuffled Maps= Number of Map Tasks * Number of Reducers > > > > Thanks and Regards, > > Vinayakumar B > > > > *From:* ch huang [mailto:justlo...@gmail.com] > *Sent:* 11 December 2013 10:56 > *To:* user@hadoop.apache.org > *Subject:* issue about Shuffled Maps in MR job summary > > > > hi,maillist: > > i run terasort with 16 reducers and 8 reducers,when i double > reducer number, the Shuffled maps is also double ,my question is the job > only run 20 map tasks (total input file is 10,and each file is 100M,my > block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers > run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? > > > > 16 reducer summary output: > > > > > > Shuffled Maps =320 > > > > 8 reducer summary output: > > > > Shuffled Maps =160 > > >