+1 for Wangda's comment. My 2 cents: There are 2 aspect of the problem: 1. How many maps task in a job. 2. How many map tasks can be run concurrently.
For #1, see Wangda's comments. For #2, it depends on the cluster resource. In your case, the cluster will only be able to run 24 map tasks concurrently at most. On Wed, Apr 2, 2014 at 10:45 AM, Wangda Tan <wheele...@gmail.com> wrote: > More specifically, Number of map tasks for each job is depended on > InputFormat.getSplits(...). The number of map tasks is as same as number of > splits returned by InputFormat.getSplits(...). You can read source code of > FileInputFormat to get more understanding about this. > > > > Regards, > Wangda Tan > > > On Wed, Apr 2, 2014 at 10:23 AM, Stanley Shi <s...@gopivotal.com> wrote: > >> map task number is not decided by the resources you need. >> It's decided by something else. >> >> Regards, >> *Stanley Shi,* >> >> >> >> On Wed, Apr 2, 2014 at 9:08 AM, Libo Yu <yu_l...@hotmail.com> wrote: >> >>> Hi all, >>> >>> I pretty much use the default yarn setting to run a word count example on >>> a 3 node cluster. Here are my settings: >>> yarn.nodemanager.resource.memory-mb 8192 >>> yarn.scheduler.minimum-allocation-mb 1024 >>> yarn.scheduler.maximum-allocation-vcores 32 >>> >>> I would expect to see 8192/1024 * 3 = 24 map tasks. >>> However, I see 32 map tasks. Anybody knows why? Thanks. >>> >>> Libo >>> >>> >> > -- Cheers -MJ