Re: The number of simultaneous map tasks is unexpected.

Adam Kawa Wed, 09 Jul 2014 07:02:48 -0700

Hi Tomek,

You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
of yarn.nodemanager.resource.memory-mb?


You consume 1GB of RAM per container (8 containers running = 8GB of memory
used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
you have only 315MB of available memory on each NodeManager. Therefore,
when you request 1GB to get a container for #8 map task, there is no
NodeManager than can give you a whole 1GB (despite having more than 1GB of
aggregated memory on the cluster).

To verify this, please check the value of
yarn.nodemanager.resource.memory-mb.

Thanks,
Adam

PS1.
Just our of curiosity. What are your values of
*yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
*yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
just to confirm. Could you have any non-default settings in your
scheduler's configuration that limit the number of resources per user?)
*yarn.nodemanager.linux-container-executor.resources-handler.class*
?

PS2.
"I am comparing M/R implementation with a custom one, where one node is
dedicated for coordination and I utilize 4 slaves fully for computation."

Note that this might not work on a larger scale, because "one node is
dedicated for coordination" might become the bottleneck. This is one of a
couple of reasons why YARN and original MapReduce at Google have decided to
run coordination processes on slave nodes.




2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <tom...@guzialek.info>:

> Thank you for your assistance, Adam.
>
> Containers running | Memory used | Memory total | Memory reserved
>                          8 |             8 GB |        9.26 GB
> |                     0 B
>
> Seems like you are right: the ApplicationMaster is occupying one slot as I
> have 8 containers running, but 7 map tasks.
>
> Again, I revised my information about m1.large instance on EC2. There are
> only 2 cores available per node giving 4 computing units (ECU units
> introduced by Amazon). So 8 slots at a time is expected. However,
> scheduling AM on a slave node ruins my experiment. I am comparing M/R
> implementation with a custom one, where one node is dedicated for
> coordination and I utilize 4 slaves fully for computation. This one core
> for AM is extending the execution time by a factor of 2. Does any one have
> an idea how to have 8 map tasks running?
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-09 0:56 GMT+02:00 Adam Kawa <kawa.a...@gmail.com>:
>
> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>> Application Master will be is started on some slave node to coordinate the
>> execution of all tasks within the job. The ApplicationMaster and tasks that
>> belong to its application run in the containers controlled by the
>> NodeManagers.
>>
>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>> tasks. But it seems not to be a root cause of you problem, because
>> according to your settings you should be able to run 16 containers
>> maximally.
>>
>> Another idea might be that your are bottlenecked by the amount of memory
>> on the cluster (each container consumes memory) and despite having vcore(s)
>> available, you can not launch new tasks. When you go to the ResourceManager
>> Web UI, do you see that you utilize whole cluster memory?
>>
>>
>>
>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <tom...@guzialek.info>:
>>
>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>> separate master node. The master has ResourceManager role (along with
>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>> single waiting map task is doubling my execution time.
>>>
>>> Pozdrawiam / Regards / Med venlig hilsen
>>> Tomasz Guziałek
>>>
>>>
>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <kawa.a...@gmail.com>:
>>>
>>> Is not your MapReduce AppMaster occupying one slot?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <tomaszguzia...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>> empty slot. Why this surprising number came up? I have checked that the
>>>> regions are equally distributed on the region servers (2 per node).
>>>> >
>>>> > My properties in the job:
>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>>> "16");
>>>> >
>>>> > My properties in the CDH:
>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>> >
>>>> > Do I miss some property? Please share your experience.
>>>> >
>>>> > Best regards
>>>> > Tomasz
>>>>
>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Reply via email to