one of important things is my input file is very small ,each file less than 10M,and i have a huge number of files
On Thu, Dec 12, 2013 at 9:58 AM, java8964 <java8...@hotmail.com> wrote: > Assume the block size is 128M, and your mapper each finishes within half > minute, then there is not too much logic in your mapper, as it can finish > processing 128M around 30 seconds. If your reducers cannot finish with 1 > week, then something is wrong. > > So you may need to find out following: > > 1) How many mappers generated in your MR job? > 2) Are they all finished? (Check them in the jobtracker through web or > command line) > 3) How many reducers in this job? > 4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing? > 5) If in the reducing stage, check the userlog of reducers. Is your code > running now? > > All these information you can find out from the Job Tracker web UI. > > Yong > > ------------------------------ > Date: Thu, 12 Dec 2013 09:03:29 +0800 > > Subject: Re: issue about Shuffled Maps in MR job summary > From: justlo...@gmail.com > To: user@hadoop.apache.org > > hi, > suppose i have 5-worknode cluster,each worknode can allocate 40G mem > ,and i do not care map task,be cause the map task in my job finished within > half a minuter,as my observe the real slow task is reduce, i allocate 12G > to each reduce task,so each worknode can support 3 reduce parallel,and the > whole cluster can support 15 reducer,and i run the job with all 15 reducer, > and i do not know if i increase reducer number from 15 to 30 ,each reduce > allocate 6G MEM,that will speed the job or not ,the job run on my product > env, it run nearly 1 week,it still not finished > > On Wed, Dec 11, 2013 at 9:50 PM, java8964 <java8...@hotmail.com> wrote: > > The whole job complete time depends on a lot of factors. Are you sure > the reducers part is the bottleneck? > > Also, it also depends on how many Reducer input groups it has in your MR > job. If you only have 20 reducer groups, even you jump your reducer count > to 40, then the epoch of reducers part won't have too much change, as the > additional 20 reducer task won't get data to process. > > If you have a lot of reducer input groups, and your cluster does have > capacity at this time, and your also have a lot idle reducer slot, then > increase your reducer count should decrease your whole job complete time. > > Make sense? > > Yong > > ------------------------------ > Date: Wed, 11 Dec 2013 14:20:24 +0800 > Subject: Re: issue about Shuffled Maps in MR job summary > From: justlo...@gmail.com > To: user@hadoop.apache.org > > > i read the doc, and find if i have 8 reducer ,a map task will output 8 > partition ,each partition will be send to a different reducer, so if i > increase reduce number ,the partition number increase ,but the volume on > network traffic is same,why sometime ,increase reducer number will not > decrease job complete time ? > > On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B > <vinayakuma...@huawei.com>wrote: > > It looks simple, J > > Shuffled Maps= Number of Map Tasks * Number of Reducers > > Thanks and Regards, > Vinayakumar B > > *From:* ch huang [mailto:justlo...@gmail.com] > *Sent:* 11 December 2013 10:56 > *To:* user@hadoop.apache.org > *Subject:* issue about Shuffled Maps in MR job summary > > hi,maillist: > i run terasort with 16 reducers and 8 reducers,when i double > reducer number, the Shuffled maps is also double ,my question is the job > only run 20 map tasks (total input file is 10,and each file is 100M,my > block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers > run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? > > 16 reducer summary output: > > > Shuffled Maps =320 > > 8 reducer summary output: > > Shuffled Maps =160 > > > >