[ https://issues.apache.org/jira/browse/YARN-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Kanter updated YARN-7341: -------------------------------- Attachment: YARN-7341.001.patch It turns out that this is a real bug introduced by YARN-7095. {{RouterWebServiceUtil#mergeMetrics}} takes into two sets of metrics and merges them into the first one. However, for a number of the metrics, it actually simply doubles the first metric. For example {code:java} metrics.setTotalNodes(metrics.getTotalNodes() + metrics.getTotalNodes()); {code} should be {code:java} metrics.setTotalNodes(metrics.getTotalNodes() + metricsResponse.getTotalNodes()); {code} This should have failed every time, but the test also had a "flaw", which only made it flakey. The test initializes two sets of metrics to random values using different {{Random}} objects using {{System.getCurrentTimeMillis()}} for the seed. However, the code is fast enough that it often takes less than 1ms, causing the two objects to use the same seed. When this happens, the two sets of metrics have the same values, and will mask the bug I described earlier. If the code is slower (e.g. GC pause, swapping, adding a log statement for the seed, etc), then you'll get different seed values and the test will (correctly) fail. The 001 patch fixes the bug by using the correct metric in {{RouterWebServiceUtil#mergeMetrics}}. And it fixes the test by ensuring that the two seeds will be different. It also cleans up some formatting and logs the seed for better debugability. > TestRouterWebServiceUtil#testMergeMetrics is flakey > --------------------------------------------------- > > Key: YARN-7341 > URL: https://issues.apache.org/jira/browse/YARN-7341 > Project: Hadoop YARN > Issue Type: Bug > Components: federation > Affects Versions: 3.0.0-beta1 > Reporter: Robert Kanter > Assignee: Robert Kanter > Attachments: YARN-7341.001.patch > > > {{TestRouterWebServiceUtil#testMergeMetrics}} is flakey. It sometimes fails > with something like: > {noformat} > Running org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil > Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil > testMergeMetrics(org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil) > Time elapsed: 0.005 sec <<< FAILURE! > java.lang.AssertionError: expected:<1092> but was:<584> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil.testMergeMetrics(TestRouterWebServiceUtil.java:473) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org