[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971709#comment-15971709 ]
Chris Douglas commented on YARN-6451: ------------------------------------- bq. when invariants are violated the log line is harder to read if combined, but perf is much better. In the current example of invariants.txt I will leave this with one invariant per line, so slower but easier to understand---works? This could evaluate the combined expression, and only if it detects some violation, iterate over the set of expressions to print specific error messages. Though shaving fractions of a millisecond off the validation check is probably not significant. +1 overall. For future versions: * The invariant checker might want to use bindings across contexts; this would be hard to express as subtypes of {{InvariantsChecker}}. For example, if one wanted to check some invariant using values from the scheduler and the metrics, there isn't a good way to compose the two with inheritance. That said, in the current RM it's hard to correlate values collected from multiple components without reasoning about their mutual consistency in a brittle, ad hoc way. How invariants are loaded and how errors are handled could also be abstracted, but (IMHO) that'd be premature. This is approachable as-is. * The unit test is kind of light * This could print a warning when it starts up, since it's mostly for testing. If it's accidentally deployed in a production setting, it should show up in the log. The RM refuses to start if {{invariants.txt}} is missing? > Create a monitor to check whether we maintain RM (scheduling) invariants > ------------------------------------------------------------------------ > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Carlo Curino > Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org