[ 
https://issues.apache.org/jira/browse/YARN-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295668#comment-17295668
 ] 

Hadoop QA commented on YARN-10672:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
27s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
41s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  6s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/artifact/out/whitespace-eol.txt{color}
 | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply 
--whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}100m  
9s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} The patch does not generate 
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 26s{color} | 
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10672 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13021618/YARN-10672.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux ae00b750594d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 6699198b54b |
| Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/testReport/ |
| Max. process+thread count | 840 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/724/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> All testcases in TestReservations are flaky
> -------------------------------------------
>
>                 Key: YARN-10672
>                 URL: https://issues.apache.org/jira/browse/YARN-10672
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: Screenshot 2021-03-04 at 21.34.18.png, Screenshot 
> 2021-03-04 at 22.06.20.png, Screenshot-mockitostubbing1-2021-03-04 at 
> 22.34.01.png, Screenshot-mockitostubbing2-2021-03-04 at 22.34.12.png, 
> YARN-10672-debuglogs.patch, YARN-10672.001.patch
>
>
> All testcases in TestReservations are flaky
> Running a particular test in TestReservations 100 times never passes all the 
> time.
>  For example, let's run testReservationNoContinueLook 100 times. For me, it 
> produced 39 failed and 61 passed results.
>  Sometimes just 1 out of 100 runs is failed.
>  Screenshot is attached.
> Stacktrace:
> {code:java}
> java.lang.AssertionError: 
> Expected :2048
> Actual   :0
> <Click to see difference>
> at org.junit.Assert.fail(Assert.java:89)
> at org.junit.Assert.failNotEquals(Assert.java:835)
> at org.junit.Assert.assertEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:633)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642)
> {code}
> The test fails here:
> {code:java}
>  // Start testing...
> // Only AM
> TestUtils.applyResourceCommitRequest(clusterResource,
>     a.assignContainers(clusterResource, node_0,
>         new ResourceLimits(clusterResource),
>         SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
> assertEquals(2 * GB, a.getUsedResources().getMemorySize());
> {code}
> With some debugging (patch attached), I realized that sometimes there are no 
> registered nodes so the AM can't be allocated and test will fail:
> {code:java}
> 2021-03-04 21:58:25,434 DEBUG [main] allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:canAssign(312)) - ******Can't assign 
> container, no nodes... rmContext: 2a8dd942, scheduler: 2322e56f
> {code}
> In these cases, this is also printed from 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler#getNumClusterNodes:
> {code:java}
> 2021-03-04 21:58:25,379 DEBUG [main] capacity.CapacityScheduler 
> (CapacityScheduler.java:getNumClusterNodes(290)) - ***Called real 
> getNumClusterNodes
> {code}
> h2. Let's break this down:
>  1. The mocking happens in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations#setup(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration,
>  boolean):
> {code:java}
> cs.setRMContext(spyRMContext);
> cs.init(csConf);
> cs.start();
> when(cs.getNumClusterNodes()).thenReturn(3);
> {code}
> Under no circumstances this could be allowed to return any other value than 3.
>  However, as mentioned above, sometimes the real method of 
> 'getNumClusterNodes' is called on CapacityScheduler.
> 2. Sometimes, this gets printed to the console:
> {code:java}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> Integer cannot be returned by isMultiNodePlacementEnabled()
> isMultiNodePlacementEnabled() should return boolean
> ***
> If you're unsure why you're getting above error read on.
> Due to the nature of the syntax above problem might occur because:
> 1. This exception *might* occur in wrongly written multi-threaded tests.
>    Please refer to Mockito FAQ on limitations of concurrency testing.
> 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub 
> spies - 
>    - with doReturn|Throw() family of methods. More in javadocs for 
> Mockito.spy() method.
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:166)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.setup(TestReservations.java:114)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:566)
>       at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>       at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>       at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>       at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:40)
>       at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
>       at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
> {code}
> TestReservations.java:166 is exactly our suspicious stubbing call:
> {code:java}
> when(cs.getNumClusterNodes()).thenReturn(3);
> {code}
> But what the heck is the relation between this and the 
> isMultiNodePlacementEnabled method?
>  Let's dive into this more.
> 3. The only caller of isMultiNodePlacementEnabled is 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager#dynamicallyUpdateAppActivitiesMaxQueueLengthIfNeeded,
>  that gets called from the anonymous Thread instance (assigned to field 
> 'cleanupThread') in class 
> [ActivitiesManager|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java#L301-L357]
>  The ActivitiesManager.serviceStart is invoked when the CapacityScheduler is 
> started.
> *Theory*: When CS is started 
> [here|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java#L156-L160],
>  the ActivitiesManager starts running and eventually 
> cs.isMultiNodePlacementEnabled() will be called by the thread 
> [here|https://github.com/apache/hadoop/blob/6699198b54bf6360c164a6ce7552c8b91a318c59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java#L266].
>  As the main thread (JUnit) started to stub method 'getNumClusterNodes', 
> there's an ongoing stubbing so when isMultiNodePlacementEnabled is called, 
> the Answer will hold the value of 3, which is improper for a boolean method, 
> so the stubbing will fail.
>  This is a race-condition situation, it is presented with 2 screenshots.
>  As the stubbing failed, subsequent calls to getNumClusterNodes will return 0 
> as the original implementation returns 0 for the tests. This is the root 
> cause of the test failures.
> 4. Some references to this mockito "issue":
>  - [https://github.com/mockito/mockito/issues/1151]
>  - [https://github.com/mockito/mockito/issues/1151#issuecomment-336718714]
>  - [https://github.com/mockito/mockito/pull/1200#issuecomment-336717814]
>  The last link points to the documentation that describes the thread-safety 
> of mockito: 
> [https://github.com/mockito/mockito/wiki/FAQ#is-mockito-thread-safe]
> *CONCLUSION & FIX*
>  As all tests in this class are affected by the setup method and this 
> thread-safety issue, we need to fix it consistently.
>  Actually, the fix is quite simple: All the stub calls of the CS instance 
> should happen before the CS.init / CS.start calls, that triggers the 
> ActivitiesManager thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to