[ 
https://issues.apache.org/jira/browse/YARN-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046804#comment-15046804
 ] 

Brahma Reddy Battula commented on YARN-4427:
--------------------------------------------

[~rohithsharma] thanks a for taking a look into this issue..Yes, 
{{masterContainer}} is null,even I thought {{rmAppAttempt}} can be null,but it 
is not in this cluster.

{code}  
    RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId);
    Container masterContainer = rmAppAttempt.getMasterContainer();
    if ((masterContainer.getId().equals(containerStatus.getContainerId())) && 
(containerStatus.getContainerState() == ContainerState.COMPLETE))
{code} 

 *Cause :*  As I mentioned in the description,ZK Cluster was up and down which 
makes frequent leader election..Thinking RM written znode with ZK1 and while 
recovering reading from ZK2 where data is not synced(Here master container 
details missed). 

 Please correct me if I am wrong..

> NPE on handleNMContainerStatus when NM is registering to RM
> -----------------------------------------------------------
>
>                 Key: YARN-4427
>                 URL: https://issues.apache.org/jira/browse/YARN-4427
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
>
>  *Seen the following in one of our environment when AM got allocated 
> container but failed to updated in the ZK Where cluster is having network 
> problem for sometime(up and down).* 
> {noformat}
> 2015-12-07 16:39:38,489 | WARN  | IPC Server handler 49 on 26003 | IPC Server 
> handler 49 on 26003, call 
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB.registerNodeManager from 
> 9.91.8.220:52169 Call#17 Retry#0 | Server.java:2107
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.handleNMContainerStatus(ResourceTrackerService.java:286)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:395)
>         at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54)
>         at 
> org.apache.hadoop.yarn.proto.ResourceTracker$ResourceTrackerService$2.callBlockingMethod(ResourceTracker.java:79)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
> {noformat}
> Corresponding code, it might not match with {{branch-2.7/Trunk}} since we had 
> modified internally.
> {code}
>  284  RMAppAttempt rmAppAttempt = rmApp.getRMAppAttempt(appAttemptId);
>  285  Container masterContainer = rmAppAttempt.getMasterContainer();
>  286  if (masterContainer.getId().equals(containerStatus.getContainerId())
>  287       && containerStatus.getContainerState() == ContainerState.COMPLETE) 
> {
>  288     ContainerStatus status =
>  289         ContainerStatus.newInstance(containerStatus.getContainerId(),
>  290           containerStatus.getContainerState(), 
> containerStatus.getDiagnostics(),
>  291           containerStatus.getContainerExitStatus());
>  292     // sending master container finished event.
>  293     RMAppAttemptContainerFinishedEvent evt =
>  294         new RMAppAttemptContainerFinishedEvent(appAttemptId, status,
>  295             nodeId);
>  296     rmContext.getDispatcher().getEventHandler().handle(evt);
>  297   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to