Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web 
UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Gaurav Dasgupta <gdsay...@gmail.com>
Date: Wed, 29 Aug 2012 13:14:11 
To: <user@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yhema...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gdsay...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>

Reply via email to