[ 
https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737084#comment-16737084
 ] 

Zac Zhou commented on YARN-8489:
--------------------------------

[~leftnoteasy], [~suma.shivaprasad]

Thanks a lot for your comments.
{quote}3) Changes of TimelineServiceV2Publisher, is it a specific issue related 
to this change? If it is a corner case we need to take care, I suggest to file 
a separate JIRA and add unit test.
{quote}
Yes, I think it's some kind of related to this patch. This would cause an NPE 
exception in resource manager. The root cause seems that the application is 
finished, but its containers metrics still needs to update using its 
TimelineServiceV2Publisher.

I'll create a separate  Jira to track it.

 

 

> Need to support "dominant" component concept inside YARN service
> ----------------------------------------------------------------
>
>                 Key: YARN-8489
>                 URL: https://issues.apache.org/jira/browse/YARN-8489
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: yarn-native-services
>            Reporter: Wangda Tan
>            Assignee: Zac Zhou
>            Priority: Major
>         Attachments: YARN-8489.001.patch, YARN-8489.002.patch, 
> YARN-8489.003.patch
>
>
> Existing YARN service support termination policy for different restart 
> policies. For example ALWAYS means service will not be terminated. And NEVER 
> means if all component terminated, service will be terminated.
> The name "dominant" might not be most appropriate , we can figure out better 
> names. But in simple, it means, a dominant component which final state will 
> determine job's final state regardless of other components.
> Use cases: 
> 1) Tensorflow job has master/worker/services/tensorboard. Once master goes to 
> final state, no matter if it is succeeded or failed, we should terminate 
> ps/tensorboard/workers. And the mark the job to succeeded/failed. 
> 2) Not sure if it is a real-world use case: A service which has multiple 
> component, some component is not restartable. For such services, if a 
> component is failed, we should mark the whole service to failed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to