[ https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172671#comment-15172671 ]
Jason Lowe commented on YARN-4747: ---------------------------------- I believe this was triggered by a missing container start event for a given container finish event. When an application runs for a long time there will be a corresponding long window between the container start event and container finish event for the AM container. The timelineserver performs retention based on entity timestamp, so there will be a long window where the container start event has been deleted but the container finish event is still present. The application history code is not prepared to handle that, as only the container start event has the node hostname and port number information. It blindly assumes that if a container entity is present in the database then we know both the host and the port. Minimally the application history server needs to be hardened to avoid the NPE, but we may want to add the host and port information to the finish event as well to allow the history page to continue to provide logs as long as there is either a container start or container finish event in the database. > AHS error 500 due to NPE when container start event is missing > -------------------------------------------------------------- > > Key: YARN-4747 > URL: https://issues.apache.org/jira/browse/YARN-4747 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver > Affects Versions: 2.7.2 > Reporter: Jason Lowe > > Saw an error 500 due to a NullPointerException caused by a missing host for > an AM container. Stacktrace to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)