[ 
https://issues.apache.org/jira/browse/YARN-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119268#comment-15119268
 ] 

Sunil G commented on YARN-4615:
-------------------------------

{noformat}
      // AM crashes, and a new app-attempt gets created
      node.nodeHeartbeat(applicationAttemptOneID, 1, ContainerState.COMPLETE);
      rm.waitForState(node, am1ContainerID, RMContainerState.COMPLETED);
      RMAppAttempt rmAppAttempt2 = MockRM.waitForAttemptScheduled(rmApp, rm);
{noformat}

Above code snippet is from test case mentioned in JIRA title. And 
{{MockRM.waitForAttemptScheduled}} has reported the wrong state pblm.

In above line {{rm.waitForState}}, AM container state is verified whether its 
COMPLETED. And waitForAttemptScheduled tries to wait till next attempt is 
SCHEDULED. However this goes to ALLOCATED (an extra node heartbeat might have 
reached and pushed the container to be allocated).

If we see {{rm.waitForState}}, it sends nodeHeartbeat if state is not correct 
(while waiting). And this is not needed as we already send a heartbeat with 
container completed details. I suspect that {{RMContainerState.COMPLETED}} was 
not reached for Am container when state was verified in  {{rm.waitForState}}. 
And one extra heartbeat is sent from this method.

I will upload a patch with a new  {{rm.waitForState}} which doesnt send 
nodeHeartBeat, rather it will only wait till timeout happens. [~rohithsharma] 
pls share your thoughts.

> TestAbstractYarnScheduler#testResourceRequestRecoveryToTheRightAppAttempt 
> fails occasionally
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-4615
>                 URL: https://issues.apache.org/jira/browse/YARN-4615
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Jason Lowe
>
> Sometimes 
> TestAbstractYarnScheduler#testResourceRequestRecoveryToTheRightAppAttempt 
> will fail like this:
> {noformat}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
> testResourceRequestRecoveryToTheRightAppAttempt[1](org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler)
>   Time elapsed: 77.427 sec  <<< FAILURE!
> java.lang.AssertionError: Attempt state is not correct (timedout): expected: 
> SCHEDULED actual: ALLOCATED for the application attempt 
> appattempt_1453254869107_0001_000002
>       at org.junit.Assert.fail(Assert.java:88)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler.testResourceRequestRecoveryToTheRightAppAttempt(TestAbstractYarnScheduler.java:572)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to