[jira] [Commented] (YARN-7371) NPE in ServiceMaster after RM is restarted and then the ServiceMaster is killed

Jian He (JIRA) Thu, 02 Nov 2017 12:59:48 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236488#comment-16236488
 ]


Jian He commented on YARN-7371:
-------------------------------

Patch looks good to me overall, some comments:
- This method can be removed as it’s only used by this class itself
{code}
public Token createContainerToken(ContainerId containerId,
    int containerVersion, NodeId nodeId, String appSubmitter,
    Resource capability, Priority priority, long createTime,
    LogAggregationContext logAggregationContext, String nodeLabelExpression,
    ContainerType containerType) {
  return createContainerToken(containerId, containerVersion, nodeId,
      appSubmitter, capability, priority, createTime, null, null,
      ContainerType.TASK, ExecutionType.GUARANTEED, -1);
}
{code}
- For testRecoverComponentsAfterRMRestart, can you also check that the 
containers retrieved by serviceClient#getStatus are old containers of the 1st 
attempt, i.e. no containers are getting relaunched because of AM restart.

> NPE in ServiceMaster after RM is restarted and then the ServiceMaster is 
> killed
> -------------------------------------------------------------------------------
>
>                 Key: YARN-7371
>                 URL: https://issues.apache.org/jira/browse/YARN-7371
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>         Attachments: YARN-7371-yarn-native-services.001.patch, 
> YARN-7371-yarn-native-services.002.patch, 
> YARN-7371-yarn-native-services.003.patch, 
> YARN-7371-yarn-native-services.004.patch, 
> YARN-7371-yarn-native-services.005.patch
>
>
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.recoverComponents(ServiceScheduler.java:313)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:265)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:150)
> Steps:
> 1. Stopped RM and then started it
> 2. Application was still running
> 3. Killed the ServiceMaster to check if it recovers
> 4. Next attempt failed with the above exception



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7371) NPE in ServiceMaster after RM is restarted and then the ServiceMaster is killed

Reply via email to