[ https://issues.apache.org/jira/browse/YARN-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745606#comment-16745606 ]
Wangda Tan commented on YARN-9197: ---------------------------------- Thanks [~kyungwan nam] for filing and working on the patch. +[~billie.rinaldi], [~eyang] could u help to review the patch? Haven't dig into details of the patch, when the state of ComponentInstanceEvent will be null and triggers the issue? Should we make the field name more specific / add more comments for easier maintenance? > NPE in service AM when failed to launch container > ------------------------------------------------- > > Key: YARN-9197 > URL: https://issues.apache.org/jira/browse/YARN-9197 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Reporter: kyungwan nam > Assignee: kyungwan nam > Priority: Major > Attachments: YARN-9197.001.patch > > > I’ve met NPE in service AM as follows. > {code} > 2019-01-02 22:35:47,582 [Component dispatcher] INFO component.Component - > [COMPONENT regionserver]: Assigned container_e15_1542704944343_0001_01_000001 > to component instance regionserver-1 and launch on host test2.com:45454 > 2019-01-02 22:35:47,588 [pool-6-thread-5] WARN ipc.Client - Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for yarn-ats: HDFS_DELEGATION_TOKEN owner=yarn-ats, > renewer=yarn, realUser=rm/test1.nfra...@example.com, issueDate=1542704946397, > maxDate=1543309746397, sequenceNumber=97, masterKeyId=90) can't be found in > cache > 2019-01-02 22:35:47,592 [pool-6-thread-5] ERROR > containerlaunch.ContainerLaunchService - [COMPINSTANCE regionserver-1 : > container_e15_1542704944343_0001_01_000001]: Failed to launch container. > java.io.IOException: Package doesn't exist as a resource: > /hdp/apps/3.0.0.0-1634/hbase/hbase.tar.gz > at > org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:41) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-01-02 22:35:47,592 [Component dispatcher] INFO component.Component - > [COMPONENT regionserver] Requesting for 1 container(s) > 2019-01-02 22:35:47,592 [Component dispatcher] INFO component.Component - > [COMPONENT regionserver] Submitting scheduling request: > SchedulingRequestPBImpl{priority=1, allocationReqId=1, > executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, > allocationTags=[regionserver], > resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:4096, > vCores:1>}, placementConstraint=notin,node,regionserver} > 2019-01-02 22:35:47,593 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE regionserver-1 : > container_e15_1542704944343_0001_01_000001]: > container_e15_1542704944343_0001_01_000001 completed. Reinsert back to > pending list and requested a new container. > exitStatus=null, diagnostics=failed before launch > 2019-01-02 22:35:47,593 [Component dispatcher] INFO > instance.ComponentInstance - Publishing component instance status > container_e15_1542704944343_0001_01_000001 FAILED > 2019-01-02 22:35:47,593 [Component dispatcher] ERROR > service.ServiceScheduler - [COMPINSTANCE regionserver-1 : > container_e15_1542704944343_0001_01_000001]: Error in handling event type STOP > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.instance.ComponentInstance.handleComponentInstanceRelaunch(ComponentInstance.java:342) > at > org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStoppedTransition.transition(ComponentInstance.java:482) > at > org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStoppedTransition.transition(ComponentInstance.java:375) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.instance.ComponentInstance.handle(ComponentInstance.java:679) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler.handle(ServiceScheduler.java:654) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler.handle(ServiceScheduler.java:643) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org