[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971709#comment-15971709
 ] 

Chris Douglas commented on YARN-6451:
-------------------------------------

bq.  when invariants are violated the log line is harder to read if combined, 
but perf is much better. In the current example of invariants.txt I will leave 
this with one invariant per line, so slower but easier to understand---works?

This could evaluate the combined expression, and only if it detects some 
violation, iterate over the set of expressions to print specific error 
messages. Though shaving fractions of a millisecond off the validation check is 
probably not significant.

+1 overall. For future versions:
* The invariant checker might want to use bindings across contexts; this would 
be hard to express as subtypes of {{InvariantsChecker}}. For example, if one 
wanted to check some invariant using values from the scheduler and the metrics, 
there isn't a good way to compose the two with inheritance. That said, in the 
current RM it's hard to correlate values collected from multiple components 
without reasoning about their mutual consistency in a brittle, ad hoc way. How 
invariants are loaded and how errors are handled could also be abstracted, but 
(IMHO) that'd be premature. This is approachable as-is.
* The unit test is kind of light
* This could print a warning when it starts up, since it's mostly for testing. 
If it's accidentally deployed in a production setting, it should show up in the 
log. The RM refuses to start if {{invariants.txt}} is missing?

> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>
>                 Key: YARN-6451
>                 URL: https://issues.apache.org/jira/browse/YARN-6451
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to