Hi Tomek, You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value of yarn.nodemanager.resource.memory-mb?
You consume 1GB of RAM per container (8 containers running = 8GB of memory used). My idea is that, after running 8 containers (1 AM + 7 map tasks), you have only 315MB of available memory on each NodeManager. Therefore, when you request 1GB to get a container for #8 map task, there is no NodeManager than can give you a whole 1GB (despite having more than 1GB of aggregated memory on the cluster). To verify this, please check the value of yarn.nodemanager.resource.memory-mb. Thanks, Adam PS1. Just our of curiosity. What are your values of *yarn.nodemanager.resource.cpu-vcores* (is not it 2?) *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but just to confirm. Could you have any non-default settings in your scheduler's configuration that limit the number of resources per user?) *yarn.nodemanager.linux-container-executor.resources-handler.class* ? PS2. "I am comparing M/R implementation with a custom one, where one node is dedicated for coordination and I utilize 4 slaves fully for computation." Note that this might not work on a larger scale, because "one node is dedicated for coordination" might become the bottleneck. This is one of a couple of reasons why YARN and original MapReduce at Google have decided to run coordination processes on slave nodes. 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <tom...@guzialek.info>: > Thank you for your assistance, Adam. > > Containers running | Memory used | Memory total | Memory reserved > 8 | 8 GB | 9.26 GB > | 0 B > > Seems like you are right: the ApplicationMaster is occupying one slot as I > have 8 containers running, but 7 map tasks. > > Again, I revised my information about m1.large instance on EC2. There are > only 2 cores available per node giving 4 computing units (ECU units > introduced by Amazon). So 8 slots at a time is expected. However, > scheduling AM on a slave node ruins my experiment. I am comparing M/R > implementation with a custom one, where one node is dedicated for > coordination and I utilize 4 slaves fully for computation. This one core > for AM is extending the execution time by a factor of 2. Does any one have > an idea how to have 8 map tasks running? > > Pozdrawiam / Regards / Med venlig hilsen > Tomasz Guziałek > > > 2014-07-09 0:56 GMT+02:00 Adam Kawa <kawa.a...@gmail.com>: > > If you run an application (e.g. MapReduce job) on YARN cluster, first the >> Application Master will be is started on some slave node to coordinate the >> execution of all tasks within the job. The ApplicationMaster and tasks that >> belong to its application run in the containers controlled by the >> NodeManagers. >> >> Maybe, you simply run 8 containers on your YARN cluster and 1 container >> is consumed by MapReduce AppMaster and 7 containers are consumed by map >> tasks. But it seems not to be a root cause of you problem, because >> according to your settings you should be able to run 16 containers >> maximally. >> >> Another idea might be that your are bottlenecked by the amount of memory >> on the cluster (each container consumes memory) and despite having vcore(s) >> available, you can not launch new tasks. When you go to the ResourceManager >> Web UI, do you see that you utilize whole cluster memory? >> >> >> >> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <tom...@guzialek.info>: >> >> I was not precise when describing my cluster. I have 4 slave nodes and a >>> separate master node. The master has ResourceManager role (along with >>> JobHistory role) and the rest have NodeManager roles. If this really is an >>> ApplicationMaster, is it possible to schedule it on the master node? This >>> single waiting map task is doubling my execution time. >>> >>> Pozdrawiam / Regards / Med venlig hilsen >>> Tomasz Guziałek >>> >>> >>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <kawa.a...@gmail.com>: >>> >>> Is not your MapReduce AppMaster occupying one slot? >>>> >>>> Sent from my iPhone >>>> >>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <tomaszguzia...@gmail.com> >>>> wrote: >>>> > >>>> > Hello all, >>>> > >>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances >>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase >>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to >>>> run simultaneously. However, only 7 are running and 1 is waiting for an >>>> empty slot. Why this surprising number came up? I have checked that the >>>> regions are equally distributed on the region servers (2 per node). >>>> > >>>> > My properties in the job: >>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create(); >>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4"); >>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", >>>> "16"); >>>> > >>>> > My properties in the CDH: >>>> > yarn.scheduler.minimum-allocation-vcores = 1 >>>> > yarn.scheduler.maximum-allocation-vcores = 4 >>>> > >>>> > Do I miss some property? Please share your experience. >>>> > >>>> > Best regards >>>> > Tomasz >>>> >>> >>> >> >