[ https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-6403: --------------------------- Attachment: YARN-6403.002.patch [~jlowe] Thanks for correcting me. The last server-side change is not proper and I corrected it as your mentioned. For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually. Unit tests for both the client and server change is added. Attach a new patch for review, please correct me if I missed something. > Invalid local resource request can raise NPE and make NM exit > ------------------------------------------------------------- > > Key: YARN-6403 > URL: https://issues.apache.org/jira/browse/YARN-6403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.8.0 > Reporter: Tao Yang > Attachments: YARN-6403.001.patch, YARN-6403.002.patch > > > Recently we found this problem on our testing environment. The app that > caused this problem added a invalid local resource request(have no location) > into ContainerLaunchContext like this: > {code} > localResources.put("test", LocalResource.newInstance(location, > LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100, > System.currentTimeMillis())); > ContainerLaunchContext amContainer = > ContainerLaunchContext.newInstance(localResources, environment, > vargsFinal, null, securityTokens, acls); > {code} > The actual value of location was null although app doesn't expect that. This > mistake cause several NMs exited with the NPE below and can't restart until > the nm recovery dirs were deleted. > {code} > FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.<init>(LocalResourceRequest.java:46) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > NPE occured when created LocalResourceRequest instance for invalid resource > request. > {code} > public LocalResourceRequest(LocalResource resource) > throws URISyntaxException { > this(resource.getResource().toPath(), //NPE occurred here > resource.getTimestamp(), > resource.getType(), > resource.getVisibility(), > resource.getPattern()); > } > {code} > We can't guarantee the validity of local resource request now, but we could > avoid damaging the cluster. Perhaps we can verify the resource both in > ContainerLaunchContext and LocalResourceRequest? Please feel free to give > your suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org