[ 
https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172671#comment-15172671
 ] 

Jason Lowe commented on YARN-4747:
----------------------------------

I believe this was triggered by a missing container start event for a given 
container finish event.  When an application runs for a long time there will be 
a corresponding long window between the container start event and container 
finish event for the AM container.  The timelineserver performs retention based 
on entity timestamp, so there will be a long window where the container start 
event has been deleted but the container finish event is still present.  The 
application history code is not prepared to handle that, as only the container 
start event has the node hostname and port number information.  It blindly 
assumes that if a container entity is present in the database then we know both 
the host and the port.

Minimally the application history server needs to be hardened to avoid the NPE, 
but we may want to add the host and port information to the finish event as 
well to allow the history page to continue to provide logs as long as there is 
either a container start or container finish event in the database.

> AHS error 500 due to NPE when container start event is missing
> --------------------------------------------------------------
>
>                 Key: YARN-4747
>                 URL: https://issues.apache.org/jira/browse/YARN-4747
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>
> Saw an error 500 due to a NullPointerException caused by a missing host for 
> an AM container.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to