Hi Gaurav You can get the information on the num of map tasks in the job from the JT web UI itself.
Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Gaurav Dasgupta <gdsay...@gmail.com> Date: Wed, 29 Aug 2012 13:14:11 To: <user@hadoop.apache.org> Reply-To: user@hadoop.apache.org Subject: Re: MRBench Maps strange behaviour Hi Hemanth, Thanks for the reply. Can you tell me how can I calculate or ensure from the counters what should be the exact number of Maps? Thanks, Gaurav Dasgupta On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yhema...@gmail.com>wrote: > Hi, > > The number of maps specified to any map reduce program (including > those part of MRBench) is generally only a hint, and the actual number > of maps will be influenced in typical cases by the amount of data > being processed. You can take a look at this wiki link to understand > more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces > > In the examples below, since the data you've generated is different, > the number of mappers are different. To be able to judge your > benchmark results, you'd need to benchmark against the same data (or > at least same type of type - i.e. size and type). > > The number of maps printed at the end is straight from the input > specified and doesn't reflect what the job actually ran with. The > information from the counters is the right one. > > Thanks > Hemanth > > On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gdsay...@gmail.com> > wrote: > > Hi All, > > > > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node > CDH3 > > cluster. After executing, I had some strange observations regarding the > > number of Maps it ran. > > > > First I ran the command: > > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps > 200 > > -reduces 200 -inputLines 1024 -inputType random > > And I could see that the actual number of Maps it ran was 201 (for all > the 3 > > runs) instead of 200 (Though the end report displays the launched to be > > 200). Here is the console report: > > > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: > job_201208230144_0035 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Job Counters > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Launched reduce tasks=200 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=617209 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all > reduces > > waiting after reserving slots (ms)=0 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Total time spent by all maps > > waiting after reserving slots (ms)=0 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Rack-local map tasks=137 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Launched map tasks=201 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: Data-local map tasks=64 > > > > 12/08/28 04:34:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1756882 > > > > > > > > Again, I ran the MRBench for just 10 Maps and 10 Reduces: > > > > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 > -reduces 10 > > > > > > > > This time the actual number of Maps were only 2 and again the end report > > displays Maps Lauched to be 10. The console output: > > > > > > > > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: > job_201208230144_0040 > > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27 > > 12/08/28 05:05:35 INFO mapred.JobClient: Job Counters > > 12/08/28 05:05:35 INFO mapred.JobClient: Launched reduce tasks=20 > > 12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6648 > > 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all > reduces > > waiting after reserving slots (ms)=0 > > 12/08/28 05:05:35 INFO mapred.JobClient: Total time spent by all maps > > waiting after reserving slots (ms)=0 > > 12/08/28 05:05:35 INFO mapred.JobClient: Launched map tasks=2 > > 12/08/28 05:05:35 INFO mapred.JobClient: Data-local map tasks=2 > > 12/08/28 05:05:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=163257 > > 12/08/28 05:05:35 INFO mapred.JobClient: FileSystemCounters > > 12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_READ=407 > > 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_READ=258 > > 12/08/28 05:05:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1072596 > > 12/08/28 05:05:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3 > > 12/08/28 05:05:35 INFO mapred.JobClient: Map-Reduce Framework > > 12/08/28 05:05:35 INFO mapred.JobClient: Map input records=1 > > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce shuffle bytes=647 > > 12/08/28 05:05:35 INFO mapred.JobClient: Spilled Records=2 > > 12/08/28 05:05:35 INFO mapred.JobClient: Map output bytes=5 > > 12/08/28 05:05:35 INFO mapred.JobClient: CPU time spent (ms)=17070 > > 12/08/28 05:05:35 INFO mapred.JobClient: Total committed heap usage > > (bytes)=6218842112 > > 12/08/28 05:05:35 INFO mapred.JobClient: Map input bytes=2 > > 12/08/28 05:05:35 INFO mapred.JobClient: Combine input records=0 > > 12/08/28 05:05:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=254 > > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input records=1 > > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce input groups=1 > > 12/08/28 05:05:35 INFO mapred.JobClient: Combine output records=0 > > 12/08/28 05:05:35 INFO mapred.JobClient: Physical memory (bytes) > > snapshot=3348828160 > > 12/08/28 05:05:35 INFO mapred.JobClient: Reduce output records=1 > > 12/08/28 05:05:35 INFO mapred.JobClient: Virtual memory (bytes) > > snapshot=22955810816 > > 12/08/28 05:05:35 INFO mapred.JobClient: Map output records=1 > > DataLines Maps Reduces AvgTime (milliseconds) > > 1 20 20 17451 > > > > Can some one please help me understand this behaviour of Hadoop in this > > case. My main purpose of running a MRBench is to calculate the Average > time > > for certain amount of Maps, Reduces, InputLines etc. If the number of > Maps > > is not what I submitted, then how can I judge my benchmark results? > > > > > > > > Thanks, > > > > Gaurav Dasgupta >