I'm not sure of any consequences for setting a higher value. You probably only need more than 1GB for very large jobs with 1000s of tasks.
On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <smehtau...@gmail.com> wrote: > That solved the problem. Thanks Sandy!! > > What is the optimal setting for > yarn.scheduler.capacity.maximum-am-resource-percent > in terms of node manager. > What are the consequences of setting to a higher value? > Also, I noticed that by default application master needs 1.5GB. Are there > any side effects we will face if I lower that to 1GB > > Siddhi > > > On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote: > >> Hi Siddhi, >> >> Moving this question to the CDH list. >> >> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5 >> help? >> >> Have you tried using the Fair Scheduler? >> >> -Sandy >> >> >> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <smehtau...@gmail.com>wrote: >> >>> Hey All, >>> >>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1 >>> NodeManager. >>> >>> We have an Map only job that launches a pig job on the cluster(similar >>> to what oozie does) >>> >>> We are seeing that the map only job launches the pig script but the pig >>> job is stuck in ACCEPTED state with no trackingUI assigned. >>> >>> I dont see any error in the nodemanager logs or the resource manager >>> logs as such. >>> >>> >>> On the nodemanager i see this logs >>> 2013-06-21 15:05:13,084 INFO capacity.ParentQueue - assignedContainer >>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048 >>> cluster=memory: 5120 >>> >>> 2013-06-21 15:05:38,898 INFO capacity.CapacityScheduler - Application >>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue: >>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB, >>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2, >>> currently active: 2 >>> >>> Which suggests that the cluster has capacity but still no application >>> master is assigned to it. >>> What am I missing?Any help is appreciated. >>> >>> I keep seeing this logs on the node manager >>> 2013-06-21 16:19:37,675 INFO monitor.ContainersMonitorImpl - Memory >>> usage of ProcessTree 12484 for container-id >>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory >>> used; 590.1mb of 2.1gb virtual memory used >>> 2013-06-21 16:19:37,696 INFO monitor.ContainersMonitorImpl - Memory >>> usage of ProcessTree 12009 for container-id >>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory >>> used; 1.4gb of 2.1gb virtual memory used >>> 2013-06-21 16:19:37,946 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> 2013-06-21 16:19:37,946 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> 2013-06-21 16:19:38,948 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> 2013-06-21 16:19:38,948 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> 2013-06-21 16:19:39,950 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> 2013-06-21 16:19:39,950 INFO nodemanager.NodeStatusUpdaterImpl - >>> Sending out status for container: container_id {, app_attempt_id {, >>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1, >>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, >>> >>> Here are my memory configurations >>> >>> <property> >>> <name>yarn.nodemanager.resource.memory-mb</name> >>> <value>5120</value> >>> <source>yarn-site.xml</source> >>> </property> >>> >>> property> >>> <name>mapreduce.map.memory.mb</name> >>> <value>512</value> >>> <source>mapred-site.xml</source> >>> </property> >>> >>> <property> >>> <name>mapreduce.reduce.memory.mb</name> >>> <value>512</value> >>> <source>mapred-site.xml</source> >>> </property> >>> >>> <property> >>> <name>mapred.child.java.opts</name> >>> <value> >>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops >>> -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/ >>> </value> >>> <source>mapred-site.xml</source> >>> </property> >>> >>> <property> >>> <name>yarn.app.mapreduce.am.resource.mb</name> >>> <value>1024</value> >>> <source>mapred-site.xml</source> >>> </property> >>> >>> Regards, >>> Siddhi >>> >> >> >