Martin Serrano created TWILL-213: ------------------------------------ Summary: Increase of instances while starting up may lead to ignored retries and instance increases Key: TWILL-213 URL: https://issues.apache.org/jira/browse/TWILL-213 Project: Apache Twill Issue Type: Bug Components: yarn Affects Versions: 0.9.0 Reporter: Martin Serrano
As seen in the test development for TWILL-181, if the number of instances for a container is increased before the `ApplicationMasterService` has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in `launchRunnable`: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if `provisioning.poll()` should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method can be used to reproduce this case intermittently by changing the `allRunning.await` check to something that does a countdown latch `onRunning` as `EchoServerTestRun` does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)