Re: Running terasort with 1 map task

Bertrand Dechoux Tue, 26 Feb 2013 03:26:10 -0800

http://wiki.apache.org/hadoop/HowManyMapsAndReduces


It is possible to have a single mapper if the input is not splittable BUT
it is rarely seen as a feature.
One could ask why you want to use a platform for distributed computing for
a job that shouldn't be distributed.

Regards

Bertrand


On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
arindamchoudhu...@gmail.com> wrote:

> Hi all,
>
> I am trying to run terasort using one map and one reduce. so, I generated
> the input data using:
>
> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>
> Then I launched the hadoop terasort job using:
>
> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>
> I thought it will run the job using 1 map and 1 reduce, but when inspect
> the job statistics I found:
>
> hadoop job -history /user/hadoop/output1
>
> Task Summary
> ============================
> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>
> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
> 10:57:55 (8sec)
> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
> 11:05:37 (7mins, 40sec)
> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
> 11:08:31 (10mins, 10sec)
> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
> 11:08:36 (4sec)
> ============================
>
> so, though I mentioned to launch one map tasks, there are 24 of them.
>
> How to solve this problem. How to tell hadoop to launch only one map.
>
> Thanks,
>

Re: Running terasort with 1 map task

Reply via email to