Re: 2 Map tasks running for a small input file

Harsh J Thu, 26 Sep 2013 04:56:36 -0700

Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).


This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <saigr...@yahoo.in> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <v...@cloudera.com>
> To: user@hadoop.apache.org; Sai Sai <saigr...@yahoo.in>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <saigr...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>



-- 
Harsh J

Re: 2 Map tasks running for a small input file

Reply via email to