[ https://issues.apache.org/jira/browse/YARN-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236488#comment-16236488 ]
Jian He commented on YARN-7371: ------------------------------- Patch looks good to me overall, some comments: - This method can be removed as it’s only used by this class itself {code} public Token createContainerToken(ContainerId containerId, int containerVersion, NodeId nodeId, String appSubmitter, Resource capability, Priority priority, long createTime, LogAggregationContext logAggregationContext, String nodeLabelExpression, ContainerType containerType) { return createContainerToken(containerId, containerVersion, nodeId, appSubmitter, capability, priority, createTime, null, null, ContainerType.TASK, ExecutionType.GUARANTEED, -1); } {code} - For testRecoverComponentsAfterRMRestart, can you also check that the containers retrieved by serviceClient#getStatus are old containers of the 1st attempt, i.e. no containers are getting relaunched because of AM restart. > NPE in ServiceMaster after RM is restarted and then the ServiceMaster is > killed > ------------------------------------------------------------------------------- > > Key: YARN-7371 > URL: https://issues.apache.org/jira/browse/YARN-7371 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > Attachments: YARN-7371-yarn-native-services.001.patch, > YARN-7371-yarn-native-services.002.patch, > YARN-7371-yarn-native-services.003.patch, > YARN-7371-yarn-native-services.004.patch, > YARN-7371-yarn-native-services.005.patch > > > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.ServiceScheduler.recoverComponents(ServiceScheduler.java:313) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:265) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:150) > Steps: > 1. Stopped RM and then started it > 2. Application was still running > 3. Killed the ServiceMaster to check if it recovers > 4. Next attempt failed with the above exception -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org