[ 
https://issues.apache.org/jira/browse/YARN-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-7341:
--------------------------------
    Attachment: YARN-7341.001.patch

It turns out that this is a real bug introduced by YARN-7095.  
{{RouterWebServiceUtil#mergeMetrics}} takes into two sets of metrics and merges 
them into the first one.  However, for a number of the metrics, it actually 
simply doubles the first metric.  For example
{code:java}
metrics.setTotalNodes(metrics.getTotalNodes() + metrics.getTotalNodes());
{code}
should be
{code:java}
metrics.setTotalNodes(metrics.getTotalNodes() + 
metricsResponse.getTotalNodes());
{code}

This should have failed every time, but the test also had a "flaw", which only 
made it flakey.  The test initializes two sets of metrics to random values 
using different {{Random}} objects using {{System.getCurrentTimeMillis()}} for 
the seed.  However, the code is fast enough that it often takes less than 1ms, 
causing the two objects to use the same seed.  When this happens, the two sets 
of metrics have the same values, and will mask the bug I described earlier.  If 
the code is slower (e.g. GC pause, swapping, adding a log statement for the 
seed, etc), then you'll get different seed values and the test will (correctly) 
fail.

The 001 patch fixes the bug by using the correct metric in 
{{RouterWebServiceUtil#mergeMetrics}}.  And it fixes the test by ensuring that 
the two seeds will be different.  It also cleans up some formatting and logs 
the seed for better debugability.

> TestRouterWebServiceUtil#testMergeMetrics is flakey
> ---------------------------------------------------
>
>                 Key: YARN-7341
>                 URL: https://issues.apache.org/jira/browse/YARN-7341
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: federation
>    Affects Versions: 3.0.0-beta1
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: YARN-7341.001.patch
>
>
> {{TestRouterWebServiceUtil#testMergeMetrics}} is flakey.  It sometimes fails 
> with something like:
> {noformat}
> Running org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil
> Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil
> testMergeMetrics(org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil)
>   Time elapsed: 0.005 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1092> but was:<584>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServiceUtil.testMergeMetrics(TestRouterWebServiceUtil.java:473)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to