[ https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Lu updated YARN-2354: ------------------------ Attachment: YARN-2354-072514.patch The problem was on numRequestedContainers. In the previous version, initially, it was set to numTotalContainers - previousAMRunningContainers.size(). Then, on container completion, the number of containers that need to to relaunched is calculated by numTotalContainers - numRequestedContainers, and normally this equals to previousAMRunningContainers.size(). If the containers are not reused (no -keep_containers_across_application_attempts), there should be no previousAMRunningContainers, so this problem only occurs when -keep_containers_across_application_attempts is set. I'm also fixing the testDSRestartWithPreviousRunningContainers UT associated with this issue. > DistributedShell may allocate more containers than client specified after it > restarts > ------------------------------------------------------------------------------------- > > Key: YARN-2354 > URL: https://issues.apache.org/jira/browse/YARN-2354 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jian He > Assignee: Li Lu > Attachments: YARN-2354-072514.patch > > > To reproduce, run distributed shell with -num_containers option, > In ApplicationMaster.java, the following code has some issue. > {code} > int numTotalContainersToRequest = > numTotalContainers - previousAMRunningContainers.size(); > for (int i = 0; i < numTotalContainersToRequest; ++i) { > ContainerRequest containerAsk = setupContainerAskForRM(); > amRMClient.addContainerRequest(containerAsk); > } > numRequestedContainers.set(numTotalContainersToRequest); > {code} > numRequestedContainers doesn't account for previous AM's requested > containers. so numRequestedContainers should be set to numTotalContainers -- This message was sent by Atlassian JIRA (v6.2#6252)