[ https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335640#comment-16335640 ]
lujie edited comment on YARN-7786 at 1/23/18 2:20 PM: ------------------------------------------------------ The way reproduce is insert sleep in foo AMLauncher.launch. I have restudy this bug, sorry about that previous analysis was wrong. I use javassist to dynamic trace the bug, below is the reason: # Before launch the AM, kill command arrives, then the RMAppImpl.FinalTransition call the setAMContainerSpec : {code:java} // code placeholder public void setAMContainerSpec(ContainerLaunchContext amContainer) { maybeInitBuilder(); if (amContainer == null) { builder.clearAmContainerSpec(); } this.amContainer = amContainer; } {code} parameter amContainer is null, so the function do two things: clearAmContainerSpec and assigned the filed this.amContainer to null. 2. when Am launch begin to run, it will call getAMContainerSpec: {code:java} // code placeholder ApplicationSubmissionContextProtoOrBuilder p = viaProto ? proto : builder; if (this.amContainer != null) { return amContainer; } // Else via proto if (!p.hasAmContainerSpec()) { return null; } {code} due to the filed this.amContainer is null, so the code will check p.hasAmContainerSpec(), and due to clearAmContainerSpec in first step, the !p.hasAmContainerSpec() is true, so getAMContainerSpec return null. although I have understanded the reason, but I still do not know how to write unit test. Or is it ok to return null in this situation ? was (Author: xiaoheipangzi): I have restudy this bug, sorry about that previous analysis was wrong. I use javassist to dynamic trace the bug, below is the reason: # Before launch the AM, kill command arrives, then the RMAppImpl.FinalTransition call the setAMContainerSpec : {code:java} // code placeholder public void setAMContainerSpec(ContainerLaunchContext amContainer) { maybeInitBuilder(); if (amContainer == null) { builder.clearAmContainerSpec(); } this.amContainer = amContainer; } {code} parameter amContainer is null, so the function do two things: clearAmContainerSpec and assigned the filed this.amContainer to null. 2. when Am launch begin to run, it will call getAMContainerSpec: {code:java} // code placeholder ApplicationSubmissionContextProtoOrBuilder p = viaProto ? proto : builder; if (this.amContainer != null) { return amContainer; } // Else via proto if (!p.hasAmContainerSpec()) { return null; } {code} due to the filed this.amContainer is null, so the code will check p.hasAmContainerSpec(), and due to clearAmContainerSpec in first step, the !p.hasAmContainerSpec() is true, so getAMContainerSpec return null. although I have understanded the reason, but I still do not know how to write unit test. Or is it ok to return null in this situation ? > NullPointerException while launching ApplicationMaster > ------------------------------------------------------ > > Key: YARN-7786 > URL: https://issues.apache.org/jira/browse/YARN-7786 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0-beta1 > Reporter: lujie > Assignee: lujie > Priority: Minor > Attachments: YARN-7786.patch, YARN-7786_1.patch, resourcemanager.log > > > Before launching the ApplicationMaster, send kill command to the job, then > some Null pointer appears: > {code} > 2017-11-25 21:27:25,333 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1511616410268_0001_000001. Got exception: > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org