[jira] [Updated] (YARN-2935) ResourceManager UI shows some stale values for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2935: Affects Version/s: 2.4.1 > ResourceManager UI shows some stale values for "Decommissioned Nodes" field > --- > > Key: YARN-2935 > URL: https://issues.apache.org/jira/browse/YARN-2935 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Assignee: Rohith >Priority: Minor > Attachments: screenshot-1.png > > > 1. Decommission NodeManager > 2. Switch RM > 3. Recommission NodeManager > "Decommissioned Nodes" field in RM shows some value even though there is no > decommissioned node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2935) ResourceManager UI shows some stale values for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2935: Attachment: screenshot-1.png > ResourceManager UI shows some stale values for "Decommissioned Nodes" field > --- > > Key: YARN-2935 > URL: https://issues.apache.org/jira/browse/YARN-2935 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Nishan Shetty >Assignee: Rohith >Priority: Minor > Attachments: screenshot-1.png > > > 1. Decommission NodeManager > 2. Switch RM > 3. Recommission NodeManager > "Decommissioned Nodes" field in RM shows some value even though there is no > decommissioned node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2935) ResourceManager UI shows some stale values for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2935: Attachment: (was: screenshot-1.png) > ResourceManager UI shows some stale values for "Decommissioned Nodes" field > --- > > Key: YARN-2935 > URL: https://issues.apache.org/jira/browse/YARN-2935 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Nishan Shetty >Assignee: Rohith >Priority: Minor > > 1. Decommission NodeManager > 2. Switch RM > 3. Recommission NodeManager > "Decommissioned Nodes" field in RM shows some value even though there is no > decommissioned node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2935) ResourceManager UI shows some stale values for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2935: Attachment: screenshot-1.png > ResourceManager UI shows some stale values for "Decommissioned Nodes" field > --- > > Key: YARN-2935 > URL: https://issues.apache.org/jira/browse/YARN-2935 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png > > > 1. Decommission NodeManager > 2. Switch RM > 3. Recommission NodeManager > "Decommissioned Nodes" field in RM shows some value even though there is no > decommissioned node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2935) ResourceManager UI shows some stale values for "Decommissioned Nodes" field
Nishan Shetty created YARN-2935: --- Summary: ResourceManager UI shows some stale values for "Decommissioned Nodes" field Key: YARN-2935 URL: https://issues.apache.org/jira/browse/YARN-2935 Project: Hadoop YARN Issue Type: Bug Reporter: Nishan Shetty Priority: Minor 1. Decommission NodeManager 2. Switch RM 3. Recommission NodeManager "Decommissioned Nodes" field in RM shows some value even though there is no decommissioned node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Description: AllocatedGB/AvailableGB in nodemanager JMX showing only integer values Screenshot attached was:AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > Screenshot attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Attachment: screenshot-2.png > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
[ https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2655: Attachment: screenshot-1.png > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values > -- > > Key: YARN-2655 > URL: https://issues.apache.org/jira/browse/YARN-2655 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > Attachments: screenshot-1.png > > > AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
Nishan Shetty created YARN-2655: --- Summary: AllocatedGB/AvailableGB in nodemanager JMX showing only integer values Key: YARN-2655 URL: https://issues.apache.org/jira/browse/YARN-2655 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty Priority: Minor AllocatedGB/AvailableGB in nodemanager JMX showing only integer values -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2595) NullPointerException is thrown while RM shutdown
[ https://issues.apache.org/jira/browse/YARN-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147320#comment-14147320 ] Nishan Shetty commented on YARN-2595: - Hi [~devaraj.k] Sorry i could not attach logs since the cluster is deleted. I will try to reproduce the same > NullPointerException is thrown while RM shutdown > > > Key: YARN-2595 > URL: https://issues.apache.org/jira/browse/YARN-2595 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Priority: Minor > > 2014-08-03 09:45:55,110 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.writeAuditLog(RMAppManager.java:221) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:213) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:480) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:71) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:662) > 2014-08-03 09:45:55,111 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2595) NullPointerException is thrown while RM shutdown
Nishan Shetty created YARN-2595: --- Summary: NullPointerException is thrown while RM shutdown Key: YARN-2595 URL: https://issues.apache.org/jira/browse/YARN-2595 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty Priority: Minor 2014-08-03 09:45:55,110 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.writeAuditLog(RMAppManager.java:221) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:213) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:480) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:71) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-03 09:45:55,111 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2523: Affects Version/s: (was: 2.4.1) 3.0.0 > ResourceManager UI showing negative value for "Decommissioned Nodes" field > -- > > Key: YARN-2523 > URL: https://issues.apache.org/jira/browse/YARN-2523 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 3.0.0 >Reporter: Nishan Shetty >Assignee: Rohith > > 1. Decommission one NodeManager by configuring ip in excludehost file > 2. Remove ip from excludehost file > 3. Execute -refreshNodes command and restart Decommissioned NodeManager > Observe that in RM UI negative value for "Decommissioned Nodes" field is shown -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2523: Priority: Major (was: Minor) > ResourceManager UI showing negative value for "Decommissioned Nodes" field > -- > > Key: YARN-2523 > URL: https://issues.apache.org/jira/browse/YARN-2523 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, webapp >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Assignee: Rohith > > 1. Decommission one NodeManager by configuring ip in excludehost file > 2. Remove ip from excludehost file > 3. Execute -refreshNodes command and restart Decommissioned NodeManager > Observe that in RM UI negative value for "Decommissioned Nodes" field is shown -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2524) ResourceManager UI shows negative value for "Decommissioned Nodes" field
[ https://issues.apache.org/jira/browse/YARN-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty resolved YARN-2524. - Resolution: Invalid 2 issues got created by mistake. > ResourceManager UI shows negative value for "Decommissioned Nodes" field > > > Key: YARN-2524 > URL: https://issues.apache.org/jira/browse/YARN-2524 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Nishan Shetty > > 1. Decommission one NodeManager by configuring ip in excludehost file > 2. Remove ip from excludehost file > 3. Execute -refreshNodes command and restart Decommissioned NodeManager > Observe that in RM UI negative value for "Decommissioned Nodes" field is shown -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2524) ResourceManager UI shows negative value for "Decommissioned Nodes" field
Nishan Shetty created YARN-2524: --- Summary: ResourceManager UI shows negative value for "Decommissioned Nodes" field Key: YARN-2524 URL: https://issues.apache.org/jira/browse/YARN-2524 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Nishan Shetty 1. Decommission one NodeManager by configuring ip in excludehost file 2. Remove ip from excludehost file 3. Execute -refreshNodes command and restart Decommissioned NodeManager Observe that in RM UI negative value for "Decommissioned Nodes" field is shown -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2523) ResourceManager UI showing negative value for "Decommissioned Nodes" field
Nishan Shetty created YARN-2523: --- Summary: ResourceManager UI showing negative value for "Decommissioned Nodes" field Key: YARN-2523 URL: https://issues.apache.org/jira/browse/YARN-2523 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Affects Versions: 2.4.1 Reporter: Nishan Shetty Priority: Minor 1. Decommission one NodeManager by configuring ip in excludehost file 2. Remove ip from excludehost file 3. Execute -refreshNodes command and restart Decommissioned NodeManager Observe that in RM UI negative value for "Decommissioned Nodes" field is shown -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2283: Affects Version/s: (was: 2.5.0) 2.4.0 > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating > application application_1405318134611_0002 with final state: FINISHING > 2014-07-14 14:43:34,947 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: NodeDataChanged with state:SyncConnected for > path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01 > for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourceman
[jira] [Updated] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2283: Affects Version/s: (was: 2.4.0) 2.5.0 > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating > application application_1405318134611_0002 with final state: FINISHING > 2014-07-14 14:43:34,947 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: NodeDataChanged with state:SyncConnected for > path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01 > for Service > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state > org.apache.hadoop.yarn.server.resourceman
[jira] [Updated] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2441: Priority: Major (was: Minor) > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106987#comment-14106987 ] Nishan Shetty commented on YARN-2441: - [~jlowe] Sorry i mentioned the wrong Affected Version. Its branch 2. Work-preserving NM is not enabled, its just plain restart > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2442: Affects Version/s: (was: 3.0.0) 2.5.0 > ResourceManager JMX UI does not give HA State > - > > Key: YARN-2442 > URL: https://issues.apache.org/jira/browse/YARN-2442 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Trivial > > ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, > STOPPED) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2441) NPE in nodemanager after restart
[ https://issues.apache.org/jira/browse/YARN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2441: Affects Version/s: (was: 3.0.0) 2.5.0 > NPE in nodemanager after restart > > > Key: YARN-2441 > URL: https://issues.apache.org/jira/browse/YARN-2441 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nishan Shetty >Priority: Minor > > {code} > 2014-08-22 16:43:19,640 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Blocking new container-requests as container manager rpc server is still > starting. > 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45026: starting > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: > Updating node address : host-10-18-40-95:45026 > 2014-08-22 16:43:20,029 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager started at /10.18.40.95:45026 > 2014-08-22 16:43:20,030 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 > 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using > callQueue class java.util.concurrent.LinkedBlockingQueue > 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 45027 > 2014-08-22 16:43:20,158 INFO > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding > protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > to the server > 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45027: starting > 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) > at > org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) > at > org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) > at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) > at > org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) > 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 > for port 45026: readAndProcess from client 10.18.40.84 threw exception > [java.lang.NullPointerException] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2442) ResourceManager JMX UI does not give HA State
Nishan Shetty created YARN-2442: --- Summary: ResourceManager JMX UI does not give HA State Key: YARN-2442 URL: https://issues.apache.org/jira/browse/YARN-2442 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Priority: Trivial ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, STOPPED) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2441) NPE in nodemanager after restart
Nishan Shetty created YARN-2441: --- Summary: NPE in nodemanager after restart Key: YARN-2441 URL: https://issues.apache.org/jira/browse/YARN-2441 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty Priority: Minor {code} 2014-08-22 16:43:19,640 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting. 2014-08-22 16:43:19,658 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-08-22 16:43:19,675 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45026: starting 2014-08-22 16:43:20,029 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : host-10-18-40-95:45026 2014-08-22 16:43:20,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at /10.18.40.95:45026 2014-08-22 16:43:20,030 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to host-10-18-40-95/10.18.40.95:45026 2014-08-22 16:43:20,073 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 2014-08-22 16:43:20,098 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45027 2014-08-22 16:43:20,158 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server 2014-08-22 16:43:20,178 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-08-22 16:43:20,192 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45027: starting 2014-08-22 16:43:20,210 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 45026: readAndProcess from client 10.18.40.84 threw exception [java.lang.NullPointerException] java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:43) at org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91) at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:278) at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:305) at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585) at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) at org.apache.hadoop.ipc.Server$Connection.processSaslToken(Server.java:1384) at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1361) at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1275) at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1238) at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1878) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1755) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1519) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595) 2014-08-22 16:43:20,227 INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 45026: readAndProcess from client 10.18.40.84 threw exception [java.lang.NullPointerException] java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM.retrievePassword(NMTokenSecretManagerInNM.java:167) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery
Nishan Shetty created YARN-2409: --- Summary: InvalidStateTransitonException in ResourceManager after job recovery Key: YARN-2409 URL: https://issues.apache.org/jira/browse/YARN-2409 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty {code} at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2382) Resource Manager throws InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087458#comment-14087458 ] Nishan Shetty commented on YARN-2382: - Hi [~ywskycn] This issue came when RM is restarted while job is in progress. What configuration you need can you please specify? > Resource Manager throws InvalidStateTransitonException > -- > > Key: YARN-2382 > URL: https://issues.apache.org/jira/browse/YARN-2382 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Nishan Shetty > > {code} > 2014-08-05 03:44:47,882 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to 10.18.40.26/10.18.40.26:11578, initiating session > 2014-08-05 03:44:47,888 INFO org.apache.zookeeper.ClientCnxn: Session > establishment complete on server 10.18.40.26/10.18.40.26:11578, sessionid = > 0x347a051fda60035, negotiated timeout = 1 > 2014-08-05 03:44:47,889 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > CONTAINER_ALLOCATED at LAUNCHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:662) > 2014-08-05 03:44:47,890 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > STATUS_UPDATE at LAUNCHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:662) > 2014-08-05 03:44:47,890 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > STATUS_UPDATE at LAUNCHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManage
[jira] [Created] (YARN-2382) Resource Manager throws InvalidStateTransitonException
Nishan Shetty created YARN-2382: --- Summary: Resource Manager throws InvalidStateTransitonException Key: YARN-2382 URL: https://issues.apache.org/jira/browse/YARN-2382 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: Nishan Shetty {code} 2014-08-05 03:44:47,882 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.18.40.26/10.18.40.26:11578, initiating session 2014-08-05 03:44:47,888 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.18.40.26/10.18.40.26:11578, sessionid = 0x347a051fda60035, negotiated timeout = 1 2014-08-05 03:44:47,889 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-05 03:44:47,890 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-05 03:44:47,890 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:664) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:764) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:745) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-08-05 03:44:47,890 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x147a051fd93002e {code} -- This message was sent by Atlassian JIRA (v6.2#6252
[jira] [Commented] (YARN-2283) RM failed to release the AM container
[ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080480#comment-14080480 ] Nishan Shetty commented on YARN-2283: - I checked this issue, it is not coming in trunk. This issue is reproducible in 2.4.* > RM failed to release the AM container > - > > Key: YARN-2283 > URL: https://issues.apache.org/jira/browse/YARN-2283 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 > Environment: NM1: AM running > NM2: Map task running > mapreduce.map.maxattempts=1 >Reporter: Nishan Shetty >Priority: Critical > > During container stability test i faced this problem > While job is running map task got killed > Observe that eventhough application is FAILED MRAppMaster process is running > till timeout because RM did not release the AM container > {code} > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1405318134611_0002_01_05 Container Transitioned from RUNNING to > COMPLETED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1405318134611_0002_01_05 in state: > COMPLETED event:FINISHED > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1405318134611_0002 > CONTAINERID=container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Finish information of container container_1405318134611_0002_01_05 is > written > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: > Stored the finish data of container container_1405318134611_0002_01_05 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: > Released container container_1405318134611_0002_01_05 of capacity > on host HOST-10-18-40-153:45026, which currently has > 1 containers, used and > available, release resources=true > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used= numContainers=1 user=testos > user-resources= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, > NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , > Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 > }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, > usedResources=, usedCapacity=0.25, > absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= vCores:8> > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 > used= cluster= > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=, > usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 > 2014-07-14 14:43:33,899 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1405318134611_0002_01 released container > container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 > #containers=1 available=6144 used=2048 with event: FINISHED > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Updating application attempt appattempt_1405318134611_0002_01 with final > state: FINISHING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING > 2014-07-14 14:43:34,924 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating > application application_1405318134611_0002 with final state: FINISHING > 2014-07-14 14:43:34,947 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > Watcher event type: NodeDataChanged with state:SyncConnected for > path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01 > for Service > org.apache.hadoop.yarn.server.resourcemanager.re
[jira] [Created] (YARN-2349) InvalidStateTransitonException after RM switch
Nishan Shetty created YARN-2349: --- Summary: InvalidStateTransitonException after RM switch Key: YARN-2349 URL: https://issues.apache.org/jira/browse/YARN-2349 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Nishan Shetty {code} 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 45018: starting 2014-07-23 19:22:28,266 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_REJECTED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped SelectChannelConnector@10.18.40.84:45020 2014-07-23 19:22:28,291 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Error when openning history file of application application_1406116264351_0007 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071944#comment-14071944 ] Nishan Shetty commented on YARN-2262: - [~zjshen] Attached logs Application id is application_1406114813957_0002 > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2262: Attachment: yarn-testos-resourcemanager-HOST-10-18-40-84.log yarn-testos-historyserver-HOST-10-18-40-95.log Capture1.PNG Capture.PNG yarn-testos-resourcemanager-HOST-10-18-40-95.log > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (YARN-2340) NPE thrown when RM restart after queue is STOPPED
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty reopened YARN-2340: - > NPE thrown when RM restart after queue is STOPPED > - > > Key: YARN-2340 > URL: https://issues.apache.org/jira/browse/YARN-2340 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.1 > Environment: Capacityscheduler with Queue a, b >Reporter: Nishan Shetty >Priority: Critical > > While job is in progress make Queue state as STOPPED and then restart RM > Observe that standby RM fails to come up as acive throwing below NPE > 2014-07-23 18:43:24,432 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED > 2014-07-23 18:43:24,433 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_ADDED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) > at java.lang.Thread.run(Thread.java:662) > 2014-07-23 18:43:24,434 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2340) NPE thrown when RM restart after queue is STOPPED
[ https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty resolved YARN-2340. - Resolution: Unresolved > NPE thrown when RM restart after queue is STOPPED > - > > Key: YARN-2340 > URL: https://issues.apache.org/jira/browse/YARN-2340 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.4.1 > Environment: Capacityscheduler with Queue a, b >Reporter: Nishan Shetty >Priority: Critical > > While job is in progress make Queue state as STOPPED and then restart RM > Observe that standby RM fails to come up as acive throwing below NPE > 2014-07-23 18:43:24,432 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED > 2014-07-23 18:43:24,433 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type APP_ATTEMPT_ADDED to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) > at java.lang.Thread.run(Thread.java:662) > 2014-07-23 18:43:24,434 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2340) NPE thrown when RM restart after queue is STOPPED
Nishan Shetty created YARN-2340: --- Summary: NPE thrown when RM restart after queue is STOPPED Key: YARN-2340 URL: https://issues.apache.org/jira/browse/YARN-2340 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.4.1 Environment: Capacityscheduler with Queue a, b Reporter: Nishan Shetty Priority: Critical While job is in progress make Queue state as STOPPED and then restart RM Observe that standby RM fails to come up as acive throwing below NPE 2014-07-23 18:43:24,432 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED 2014-07-23 18:43:24,433 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602) at java.lang.Thread.run(Thread.java:662) 2014-07-23 18:43:24,434 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2330) Jobs are not displaying in timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070081#comment-14070081 ] Nishan Shetty commented on YARN-2330: - Here RM has gone down abrupty > Jobs are not displaying in timeline server after RM restart > --- > > Key: YARN-2330 > URL: https://issues.apache.org/jira/browse/YARN-2330 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.1 > Environment: Nodemanagers 3 (3*8GB) > Queues A = 70% > Queues B = 30% >Reporter: Nishan Shetty > > Submit jobs to queue a > While job is running Restart RM > Observe that those jobs are not displayed in timelineserver > {code} > 2014-07-22 10:11:32,084 ERROR > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > History information of application application_1406002968974_0003 is not > included into the result due to the exception > java.io.IOException: Cannot seek to negative offset > at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63) > at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624) > at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.(FileSystemApplicationHistoryStore.java:683) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103) > at > org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) > at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFil
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070073#comment-14070073 ] Nishan Shetty commented on YARN-2262: - [~zjshen] {quote}How's your setup for RM restarting?{quote} RM HA setup where active RM restarted gracefully {quote}does the application continue after RM restarting?{quote} Yes application continues after RM restart and finally application will be SUCCEEDED {quote}If so, will the timeline server converge to show the missing fields correctly?{quote} No timeline does not show correct fields even after application is SUCCEEDED Thanks > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2330) Jobs are not displaying in timeline server after RM restart
Nishan Shetty created YARN-2330: --- Summary: Jobs are not displaying in timeline server after RM restart Key: YARN-2330 URL: https://issues.apache.org/jira/browse/YARN-2330 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.1 Environment: Nodemanagers 3 (3*8GB) Queues A = 70% Queues B = 30% Reporter: Nishan Shetty Submit jobs to queue a While job is running Restart RM Observe that those jobs are not displayed in timelineserver {code} 2014-07-22 10:11:32,084 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: History information of application application_1406002968974_0003 is not included into the result due to the exception java.io.IOException: Cannot seek to negative offset at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:624) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.(FileSystemApplicationHistoryStore.java:683) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069805#comment-14069805 ] Nishan Shetty commented on YARN-2262: - Hi [~zjshen] While application is in progress i have restarted the RM > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2309) NPE during RM-Restart test scenario
Nishan Shetty created YARN-2309: --- Summary: NPE during RM-Restart test scenario Key: YARN-2309 URL: https://issues.apache.org/jira/browse/YARN-2309 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Nishan Shetty Priority: Minor During RMRestart test scenarios, we met with below exception. A point to note here is, Zookeeper also was not stable during this testing, we could see many Zookeeper exception before getting this NPE {code} 2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When stopping the service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039) {code} Zookeeper Exception {code} 2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017) at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632) at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2283) RM failed to release the AM container
Nishan Shetty created YARN-2283: --- Summary: RM failed to release the AM container Key: YARN-2283 URL: https://issues.apache.org/jira/browse/YARN-2283 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Environment: NM1: AM running NM2: Map task running mapreduce.map.maxattempts=1 Reporter: Nishan Shetty Priority: Critical During container stability test i faced this problem While job is running map task got killed Observe that eventhough application is FAILED MRAppMaster process is running till timeout because RM did not release the AM container {code} 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1405318134611_0002_01_05 Container Transitioned from RUNNING to COMPLETED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_1405318134611_0002_01_05 in state: COMPLETED event:FINISHED 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=testos OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1405318134611_0002 CONTAINERID=container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: Finish information of container container_1405318134611_0002_01_05 is written 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: Stored the finish data of container container_1405318134611_0002_01_05 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1405318134611_0002_01_05 of capacity on host HOST-10-18-40-153:45026, which currently has 1 containers, used and available, release resources=true 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used= numContainers=1 user=testos user-resources= 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_05, NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: , Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster= 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster= 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1405318134611_0002_01 released container container_1405318134611_0002_01_05 on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event: FINISHED 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1405318134611_0002_01 with final state: FINISHING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1405318134611_0002_01 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1405318134611_0002 with final state: FINISHING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_01 for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1405318134611_0002 State change from RUNNING to FINAL_SAVING 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1405318134611_0002 2014-07-14 14:43:34,947 INFO
[jira] [Commented] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057460#comment-14057460 ] Nishan Shetty commented on YARN-2258: - Thanks [~vinodkv] and [~leftnoteasy] for looking into the issue > Aggregation of MR job logs failing when Resourcemanager switches > > > Key: YARN-2258 > URL: https://issues.apache.org/jira/browse/YARN-2258 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Wangda Tan > > 1.Install RM in HA mode > 2.Run a job with more tasks > 3.Induce RM switchover while job is in progress > Observe that log aggregation fails for the job which is running when > Resourcemanager switchover is induced. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2272) UI issues in timeline server
[ https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty resolved YARN-2272. - Resolution: Duplicate > UI issues in timeline server > > > Key: YARN-2272 > URL: https://issues.apache.org/jira/browse/YARN-2272 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Priority: Minor > > Links to nodemanager is not working in timeline server -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2272) UI issues in timeline server
[ https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057454#comment-14057454 ] Nishan Shetty commented on YARN-2272: - Thanks [~zjshen] for looking into the issue I will close this as duplicate of YARN-1884 > UI issues in timeline server > > > Key: YARN-2272 > URL: https://issues.apache.org/jira/browse/YARN-2272 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Priority: Minor > > Links to nodemanager is not working in timeline server -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2272) UI issues in timeline server
[ https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2272: Priority: Minor (was: Major) > UI issues in timeline server > > > Key: YARN-2272 > URL: https://issues.apache.org/jira/browse/YARN-2272 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Priority: Minor > > Links to nodemanager is not working in timeline server -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2272) UI issues in timeline server
Nishan Shetty created YARN-2272: --- Summary: UI issues in timeline server Key: YARN-2272 URL: https://issues.apache.org/jira/browse/YARN-2272 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Links to nodemanager is not working in timeline server -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2262: Description: Few fields displaying wrong values in Timeline server after RM restart State: null FinalStatus:UNDEFINED Started:8-Jul-2014 14:58:08 Elapsed:2562047397789hrs, 44mins, 47sec was: Few fields displaying wrong values in Timeline server after RM restart > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
Nishan Shetty created YARN-2262: --- Summary: Few fields displaying wrong values in Timeline server after RM restart Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Few fields displaying wrong values in Timeline server after RM restart -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2259) NM-Local dir cleanup failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2259: Attachment: Capture.PNG > NM-Local dir cleanup failing when Resourcemanager switches > -- > > Key: YARN-2259 > URL: https://issues.apache.org/jira/browse/YARN-2259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 > Environment: >Reporter: Nishan Shetty > Attachments: Capture.PNG > > > Induce RM switchover while job is in progress > Observe that NM-Local dir cleanup failing when Resourcemanager switches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2259) NM-Local dir cleanup failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054653#comment-14054653 ] Nishan Shetty commented on YARN-2259: - Hi [~vinodkv] "cleanup failing" means the intermediate files/folders created by tasks while running are not deleted after task/job completion. Attached the screenshot > NM-Local dir cleanup failing when Resourcemanager switches > -- > > Key: YARN-2259 > URL: https://issues.apache.org/jira/browse/YARN-2259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 > Environment: >Reporter: Nishan Shetty > > Induce RM switchover while job is in progress > Observe that NM-Local dir cleanup failing when Resourcemanager switches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2259) NM-Local dir cleanup failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2259: Affects Version/s: (was: 2.4.1) 2.4.0 > NM-Local dir cleanup failing when Resourcemanager switches > -- > > Key: YARN-2259 > URL: https://issues.apache.org/jira/browse/YARN-2259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 > Environment: >Reporter: Nishan Shetty > > Induce RM switchover while job is in progress > Observe that NM-Local dir cleanup failing when Resourcemanager switches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-2258: Affects Version/s: (was: 2.4.1) 2.4.0 > Aggregation of MR job logs failing when Resourcemanager switches > > > Key: YARN-2258 > URL: https://issues.apache.org/jira/browse/YARN-2258 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Affects Versions: 2.4.0 >Reporter: Nishan Shetty > > 1.Install RM in HA mode > 2.Run a job with more tasks > 3.Induce RM switchover while job is in progress > Observe that log aggregation fails for the job which is running when > Resourcemanager switchover is induced. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches
[ https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054500#comment-14054500 ] Nishan Shetty commented on YARN-2258: - Successful flow {code} "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1032483,114):2014-07-06 22:01:52,928 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0022 transitioned from NEW to INITING "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1032499,114):2014-07-06 22:01:52,974 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0022 transitioned from INITING to RUNNING "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1033850,114):2014-07-06 22:02:56,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0022 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1033853,114):2014-07-06 22:02:57,048 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0022 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED {code} Failed flow {code} "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1074500,114):2014-07-06 22:37:03,775 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0056 transitioned from NEW to INITING "ftp(1):/home/testos/install/hadoop/logs/yarn-testos-nodemanager-HOST-10-18-40-153.log.1"(1074502,114):2014-07-06 22:37:03,860 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1404662892762_0056 transitioned from INITING to RUNNING {code} > Aggregation of MR job logs failing when Resourcemanager switches > > > Key: YARN-2258 > URL: https://issues.apache.org/jira/browse/YARN-2258 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty > > 1.Install RM in HA mode > 2.Run a job with more tasks > 3.Induce RM switchover while job is in progress > Observe that log aggregation fails for the job which is running when > Resourcemanager switchover is induced. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1113) Job failing when one of the NM local dir got filled
Nishan Shetty created YARN-1113: --- Summary: Job failing when one of the NM local dir got filled Key: YARN-1113 URL: https://issues.apache.org/jira/browse/YARN-1113 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty 1.In NodeManager only one disk is configured for NM local dir 2.Make that disk full 3.Run job Problems -Tasks assigned to that disk filled NM is waiting for container expiry time(10min) -After expiry time that containers will be killed and new task attempt spawned -All the other tasks attempts are getting assigned to same node only and failing the tasks 4 times intern job fails -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
[ https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744860#comment-13744860 ] Nishan Shetty commented on YARN-880: [~ojoshi] I am using capacity sheduler number of maps:3 number of reduces:1 Slow start configured to 1 > Configuring map/reduce memory equal to nodemanager's memory, hangs the job > execution > > > Key: YARN-880 > URL: https://issues.apache.org/jira/browse/YARN-880 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha >Reporter: Nishan Shetty >Assignee: Omkar Vinit Joshi >Priority: Critical > > Scenario: > = > Cluster is installed with 2 Nodemanagers > Configuraiton: > NM memory (yarn.nodemanager.resource.memory-mb): 8 gb > map and reduce memory : 8 gb > Appmaster memory: 2 gb > If map task is reserved on the same nodemanager where appmaster of the same > job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-995) "WebAppException" is thrown in appmaster logs if any task got failed
[ https://issues.apache.org/jira/browse/YARN-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty moved MAPREDUCE-5434 to YARN-995: --- Component/s: (was: applicationmaster) Affects Version/s: (was: 2.0.5-alpha) 2.0.5-alpha Key: YARN-995 (was: MAPREDUCE-5434) Project: Hadoop YARN (was: Hadoop Map/Reduce) > "WebAppException" is thrown in appmaster logs if any task got failed > > > Key: YARN-995 > URL: https://issues.apache.org/jira/browse/YARN-995 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Nishan Shetty > > {code} > 2013-07-30 14:49:42,521 INFO [IPC Server handler 13 on 31627] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request > from attempt_1375161249957_0022_r_000279_0. startIndex 1002 maxEvents 3000 > 2013-07-30 14:49:42,865 INFO [IPC Server handler 19 on 31627] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request > from attempt_1375161249957_0022_r_000278_0. startIndex 1002 maxEvents 3000 > 2013-07-30 14:49:42,876 INFO [IPC Server handler 23 on 31627] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: MapCompletionEvents request > from attempt_1375161249957_0022_r_000280_0. startIndex 1002 maxEvents 3000 > 2013-07-30 14:49:42,904 ERROR [2065027703@qtp-1099518668-8] > org.apache.hadoop.mapreduce.v2.app.webapp.AppController: Failed to render > attempts page with task type : r for job id : job_1375161249957_0022 > org.apache.hadoop.yarn.webapp.WebAppException: Error rendering block: > nestLevel=6 expected 5 > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:74) > at org.apache.hadoop.yarn.webapp.View.render(View.java:233) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:47) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:843) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:54) > at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:80) > at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:210) > at > org.apache.hadoop.mapreduce.v2.app.webapp.AppController.attempts(AppController.java:290) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:150) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:123) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.util.IpValidationFilter.doFilter(IpValidationFilter.java:77) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1085) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop
[jira] [Created] (YARN-923) Application is "FAILED" when multiple appmaster attempts are spawned
Nishan Shetty created YARN-923: -- Summary: Application is "FAILED" when multiple appmaster attempts are spawned Key: YARN-923 URL: https://issues.apache.org/jira/browse/YARN-923 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.0.5-alpha Reporter: Nishan Shetty 1.Run job with 142 maps 2.After some map tasks executed kill NM where appmaster running(Using kill -9 cmd) 3.Now obeserve that till NM expiry interval that appmaster will be running after NM expiry interval that appmaster will be killed and new appmaster will be launched Observations: --- 1.First appmaster while going down deletes the staging dir of job 2.While new appmaster is running it will kill all the tasks running in it and fails the application saying files in staging dir not present -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-902) "Used Resources" field in Resourcemanager scheduler UI not displaying any values
[ https://issues.apache.org/jira/browse/YARN-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701252#comment-13701252 ] Nishan Shetty commented on YARN-902: [~sandyr] I am getting this with Capacity Scheduler > "Used Resources" field in Resourcemanager scheduler UI not displaying any > values > > > Key: YARN-902 > URL: https://issues.apache.org/jira/browse/YARN-902 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Nishan Shetty >Priority: Minor > > "Used Resources" field in Resourcemanager scheduler UI not displaying any > values -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-902) "Used Resources" field in Resourcemanager scheduler UI not displaying any values
Nishan Shetty created YARN-902: -- Summary: "Used Resources" field in Resourcemanager scheduler UI not displaying any values Key: YARN-902 URL: https://issues.apache.org/jira/browse/YARN-902 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Nishan Shetty Priority: Minor "Used Resources" field in Resourcemanager scheduler UI not displaying any values -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-901) "Active users" field in Resourcemanager scheduler UI gives negative values
Nishan Shetty created YARN-901: -- Summary: "Active users" field in Resourcemanager scheduler UI gives negative values Key: YARN-901 URL: https://issues.apache.org/jira/browse/YARN-901 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Nishan Shetty Priority: Minor "Active users" field in Resourcemanager scheduler UI gives negative values on Resourcemanager restart when job is in progress -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
Nishan Shetty created YARN-880: -- Summary: Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution Key: YARN-880 URL: https://issues.apache.org/jira/browse/YARN-880 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Priority: Critical Scenario: = Cluster is installed with 2 Nodemanagers Configuraiton: NM memory (yarn.nodemanager.resource.memory-mb): 8 gb map and reduce memory : 8 gb Appmaster memory: 2 gb If map task is reserved on the same nodemanager where appmaster of the same job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-859) "Applications Per User" is giving ambiguous values in scheduler UI
Nishan Shetty created YARN-859: -- Summary: "Applications Per User" is giving ambiguous values in scheduler UI Key: YARN-859 URL: https://issues.apache.org/jira/browse/YARN-859 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Priority: Minor 1.Configure "yarn.scheduler.capacity.root.default.user-limit-factor" as 2 Observe that Applications Per User values in scheuler UI is giving ambiguous values Max applications per user cannot be more than that of cluster Max Applications: 1000 Max Applications Per User: 2000 Max Active Applications: 5 Max Active Applications Per User: 10 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-817) If input path does not exist application/job id is getting assigned.
[ https://issues.apache.org/jira/browse/YARN-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty moved MAPREDUCE-5322 to YARN-817: --- Component/s: (was: resourcemanager) resourcemanager Affects Version/s: (was: 2.0.2-alpha) (was: 2.0.1-alpha) 2.0.2-alpha 2.0.1-alpha Key: YARN-817 (was: MAPREDUCE-5322) Project: Hadoop YARN (was: Hadoop Map/Reduce) > If input path does not exist application/job id is getting assigned. > > > Key: YARN-817 > URL: https://issues.apache.org/jira/browse/YARN-817 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha, 2.0.2-alpha >Reporter: Nishan Shetty >Priority: Minor > > 1.Run job by giving input as some path which does not exist > 2.Application/job is is getting assigned. > 2013-06-12 16:00:24,494 INFO > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new > applicationId: 12 > Suggestion > Before assiging job/app id input path check can be made. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-580) Delay scheduling in capacity scheduler is not ensuring 100% locality
Nishan Shetty created YARN-580: -- Summary: Delay scheduling in capacity scheduler is not ensuring 100% locality Key: YARN-580 URL: https://issues.apache.org/jira/browse/YARN-580 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.1-alpha, 2.0.2-alpha Reporter: Nishan Shetty Example Machine1: 3 blocks Machine2: 2 blocks Machine3: 1 blocks When we run job on this data, node locality is not ensured 100% Tasks run like below even if slots are available in all nodes: -- Machine1: 4Task Machine2: 2Task Machine3: No task -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-500) ResourceManager webapp is using next port if configured port is already in use
[ https://issues.apache.org/jira/browse/YARN-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609927#comment-13609927 ] Nishan Shetty commented on YARN-500: Same is for Nodemanager webapp > ResourceManager webapp is using next port if configured port is already in use > -- > > Key: YARN-500 > URL: https://issues.apache.org/jira/browse/YARN-500 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Nishan Shetty > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-500) ResourceManager webapp is using next port if configured port is already in use
[ https://issues.apache.org/jira/browse/YARN-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty moved MAPREDUCE-5091 to YARN-500: --- Component/s: (was: resourcemanager) resourcemanager Affects Version/s: (was: 2.0.2-alpha) (was: 2.0.1-alpha) 2.0.2-alpha 2.0.1-alpha Key: YARN-500 (was: MAPREDUCE-5091) Project: Hadoop YARN (was: Hadoop Map/Reduce) > ResourceManager webapp is using next port if configured port is already in use > -- > > Key: YARN-500 > URL: https://issues.apache.org/jira/browse/YARN-500 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.1-alpha, 2.0.2-alpha >Reporter: Nishan Shetty > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-323) Yarn CLI commands prints classpath
Nishan Shetty created YARN-323: -- Summary: Yarn CLI commands prints classpath Key: YARN-323 URL: https://issues.apache.org/jira/browse/YARN-323 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Priority: Minor Execute ./yarn commands. It will print classpath in console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-16) NM throws "ArithmeticException: / by zero" when there is no available space on configured local dir
Nishan Shetty created YARN-16: - Summary: NM throws "ArithmeticException: / by zero" when there is no available space on configured local dir Key: YARN-16 URL: https://issues.apache.org/jira/browse/YARN-16 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-alpha Reporter: Nishan Shetty Priority: Minor {code:xml} 12/08/09 13:59:49 INFO mapreduce.Job: Task Id : attempt_1344492468506_0023_m_00_0, Status : FAILED java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:371) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:849) {code} Instead of throwing exception directly we can log a warning saying no available space on configured local dir -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-16) NM throws "ArithmeticException: / by zero" when there is no available space on configured local dir
[ https://issues.apache.org/jira/browse/YARN-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishan Shetty updated YARN-16: -- Description: 12/08/09 13:59:49 INFO mapreduce.Job: Task Id : attempt_1344492468506_0023_m_00_0, Status : FAILED java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:371) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:849) Instead of throwing exception directly we can log a warning saying no available space on configured local dir was: {code:xml} 12/08/09 13:59:49 INFO mapreduce.Job: Task Id : attempt_1344492468506_0023_m_00_0, Status : FAILED java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:371) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:849) {code} Instead of throwing exception directly we can log a warning saying no available space on configured local dir > NM throws "ArithmeticException: / by zero" when there is no available space > on configured local dir > --- > > Key: YARN-16 > URL: https://issues.apache.org/jira/browse/YARN-16 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.0-alpha >Reporter: Nishan Shetty >Priority: Minor > > 12/08/09 13:59:49 INFO mapreduce.Job: Task Id : > attempt_1344492468506_0023_m_00_0, Status : FAILED > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:371) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:849) > Instead of throwing exception directly we can log a warning saying no > available space on configured local dir -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira