Re: Yarn job stuck with no application master being assigned

Sandy Ryza Fri, 21 Jun 2013 18:19:27 -0700

I'm not sure of any consequences for setting a higher value.

You probably only need more than 1GB for very large jobs with 1000s of
tasks.



On Fri, Jun 21, 2013 at 6:07 PM, Siddhi Mehta <smehtau...@gmail.com> wrote:

> That solved the problem. Thanks Sandy!!
>
> What is the optimal setting for 
> yarn.scheduler.capacity.maximum-am-resource-percent
> in terms of node manager.
> What are the consequences of setting to a higher value?
> Also, I noticed that by default application master needs 1.5GB. Are there
> any side effects we will face if I lower that to 1GB
>
> Siddhi
>
>
> On Fri, Jun 21, 2013 at 4:28 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote:
>
>> Hi Siddhi,
>>
>> Moving this question to the CDH list.
>>
>> Does setting yarn.scheduler.capacity.maximum-am-resource-percent to .5
>> help?
>>
>> Have you tried using the Fair Scheduler?
>>
>> -Sandy
>>
>>
>> On Fri, Jun 21, 2013 at 4:21 PM, Siddhi Mehta <smehtau...@gmail.com>wrote:
>>
>>> Hey All,
>>>
>>> I am running a Hadoop 2.0(cdh4.2.1) cluster on a single node with 1
>>> NodeManager.
>>>
>>> We have an Map only job that launches a pig job on the cluster(similar
>>> to what oozie does)
>>>
>>> We are seeing that the map only job launches the pig script but the pig
>>> job is stuck in ACCEPTED state with no trackingUI assigned.
>>>
>>> I dont see any error in the nodemanager logs or the resource manager
>>> logs as such.
>>>
>>>
>>> On the nodemanager i see this logs
>>> 2013-06-21 15:05:13,084 INFO  capacity.ParentQueue - assignedContainer
>>> queue=root usedCapacity=0.4 absoluteUsedCapacity=0.4 used=memory: 2048
>>> cluster=memory: 5120
>>>
>>> 2013-06-21 15:05:38,898 INFO  capacity.CapacityScheduler - Application
>>> Submission: appattempt_1371850881510_0003_000001, user: smehta queue:
>>> default: capacity=1.0, absoluteCapacity=1.0, usedResources=2048MB,
>>> usedCapacity=0.4, absoluteUsedCapacity=0.4, numApps=2, numContainers=2,
>>> currently active: 2
>>>
>>> Which suggests that the cluster has capacity but still no application
>>> master is assigned to it.
>>> What am I missing?Any help is appreciated.
>>>
>>> I keep seeing this logs on the node manager
>>> 2013-06-21 16:19:37,675 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12484 for container-id
>>> container_1371850881510_0002_01_000002: 157.1mb of 1.0gb physical memory
>>> used; 590.1mb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,696 INFO  monitor.ContainersMonitorImpl - Memory
>>> usage of ProcessTree 12009 for container-id
>>> container_1371850881510_0002_01_000001: 181.0mb of 1.0gb physical memory
>>> used; 1.4gb of 2.1gb virtual memory used
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:37,946 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:38,948 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>> 2013-06-21 16:19:39,950 INFO  nodemanager.NodeStatusUpdaterImpl -
>>> Sending out status for container: container_id {, app_attempt_id {,
>>> application_id {, id: 2, cluster_timestamp: 1371850881510, }, attemptId: 1,
>>> }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
>>>
>>> Here are my memory configurations
>>>
>>> <property>
>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>> <value>5120</value>
>>> <source>yarn-site.xml</source>
>>> </property>
>>>
>>> property>
>>> <name>mapreduce.map.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapreduce.reduce.memory.mb</name>
>>> <value>512</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.child.java.opts</name>
>>> <value>
>>> -Xmx512m -Djava.net.preferIPv4Stack=true -XX:+UseCompressedOops
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=/home/sfdc/logs/hadoop/userlogs/@taskid@/
>>> </value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> <property>
>>> <name>yarn.app.mapreduce.am.resource.mb</name>
>>> <value>1024</value>
>>> <source>mapred-site.xml</source>
>>> </property>
>>>
>>> Regards,
>>> Siddhi
>>>
>>
>>
>

Re: Yarn job stuck with no application master being assigned

Reply via email to