Hi Sai, What Viji indicated is that the default Apache Hadoop setting for any input is 2 maps. If the input is larger than one block, regular policies of splitting such as those stated by Shekhar would apply. But for smaller inputs, just for an out-of-box "parallelism experience", Hadoop ships with a 2-maps forced splitting default (mapred.map.tasks=2).
This means your 5 lines is probably divided as 2:3 or other ratios and is processed by 2 different Tasks. As Viji also indicated, to turn off this behavior, you can set the mapred.map.tasks to 1 in your configs and then you'll see only one map task process all 5 lines. On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <saigr...@yahoo.in> wrote: > Thanks Viji. > I am confused a little when the data is small y would there b 2 tasks. > U will use the min as 2 if u need it but in this case it is not needed due > to size of the data being small > so y would 2 map tasks exec. > Since it results in 1 block with 5 lines of data in it > i am assuming this results in 5 map computations 1 per each line > and all of em in 1 process/node since i m using a pseudo vm. > Where is the second task coming from. > The 5 computations of map on each line is 1 task. > Is this right. > Please help. > Thanks > > > ________________________________ > From: Viji R <v...@cloudera.com> > To: user@hadoop.apache.org; Sai Sai <saigr...@yahoo.in> > Sent: Thursday, 26 September 2013 5:09 PM > Subject: Re: 2 Map tasks running for a small input file > > Hi, > > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to > avoid this. > > Regards, > Viji > > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <saigr...@yahoo.in> wrote: >> Hi >> Here is the input file for the wordcount job: >> ****************** >> Hi This is a simple test. >> Hi Hadoop how r u. >> Hello Hello. >> Hi Hi. >> Hadoop Hadoop Welcome. >> ****************** >> >> After running the wordcount successfully >> here r the counters info: >> >> *************** >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386 >> Launched reduce tasks 0 0 1 >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0 >> Launched map tasks 0 0 2 >> Data-local map tasks 0 0 2 >> SLOTS_MILLIS_REDUCES 0 0 9,199 >> *************** >> My question why r there 2 launched map tasks when i have only a small >> file. >> Per my understanding it is only 1 block. >> and should be only 1 split. >> Then for each line a map computation should occur >> but it shows 2 map tasks. >> Please let me know. >> Thanks >> Sai >> > > -- Harsh J