Martin Serrano created TWILL-213:
------------------------------------

             Summary: Increase of instances while starting up may lead to 
ignored retries and instance increases
                 Key: TWILL-213
                 URL: https://issues.apache.org/jira/browse/TWILL-213
             Project: Apache Twill
          Issue Type: Bug
          Components: yarn
    Affects Versions: 0.9.0
            Reporter: Martin Serrano


As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the `ApplicationMasterService` has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in `launchRunnable`:

{code}
      if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||
        
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
        provisioning.poll();
      }
{code}

we are comparing the expected containers to the running count to decide if 
`provisioning.poll()` should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method 
can be used to reproduce this case intermittently by changing the 
`allRunning.await` check to something that does a countdown latch `onRunning` 
as `EchoServerTestRun` does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to