[ 
https://issues.apache.org/jira/browse/YARN-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203129#comment-16203129
 ] 

Rohith Sharma K S commented on YARN-7272:
-----------------------------------------

Update : I had offline discussion with Vinod and his concern is scope of this 
JIRA is limited to auxiliary services that runs on NodeManager. Given app 
collectors can be launched as separate container which is long term goal but 
not supported yet, fault tolerance design should consider all those use cases 
as well. Otherwise it will end up in redesigning new fault tolerance solution 
later.
Thinking wrt to container based app collectors recovery which also holds good 
for auxiliary service recovery, storing WAL in HDFS makes more appropriate. 

> Enable timeline collector fault tolerance
> -----------------------------------------
>
>                 Key: YARN-7272
>                 URL: https://issues.apache.org/jira/browse/YARN-7272
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineclient, timelinereader, timelineserver
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>
> If a NM goes down and along with it the timeline collector aux service for a 
> running yarn app, we would like that yarn app to re-establish connection with 
> a new timeline collector. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to