[ 
https://issues.apache.org/jira/browse/YARN-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450113#comment-16450113
 ] 

Billie Rinaldi commented on YARN-8122:
--------------------------------------

I think it would be a better health check to count the number of ready 
containers instead of counting the number of running containers towards the 
health percentage. A running container could be failing in a restart loop and 
would still be considered healthy. Plus, since the readiness check is 
configurable, this would allow the user to have control over what constitutes a 
healthy container. If they wanted the current behavior of the patch, they could 
disable the default readiness check for the component.* Alternatively, they 
would be able to configure the component so that a container isn't considered 
healthy until the process is up and listening on a port, for example. (* – It 
is hard for me to imagine a use case for the current behavior of the patch. The 
feature worked in Slider because NMs would eventually get blacklisted, but 
since container restart is enabled in the service AM, the feature doesn't work 
the same way here. It seems like the only time a component would fall below the 
health threshold would be when the cluster doesn't have enough capacity to run 
the desired number of containers.)

> Component health threshold monitor
> ----------------------------------
>
>                 Key: YARN-8122
>                 URL: https://issues.apache.org/jira/browse/YARN-8122
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Gour Saha
>            Assignee: Gour Saha
>            Priority: Major
>         Attachments: YARN-8122.001.patch, YARN-8122.002.patch, 
> YARN-8122.003.patch, YARN-8122.004.patch, YARN-8122.005.patch, 
> YARN-8122.draft.patch
>
>
> Slider supported component health threshold monitoring with SLIDER-1246. It 
> would be good to have this feature for YARN Service too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to