Sietse T. Au created YARN-1902: ---------------------------------- Summary: Allocation of too many containers when a second request is done with the same resource capability Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.3.0, 2.2.0 Reporter: Sietse T. Au
Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: This behavior does not occur when no containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. -- This message was sent by Atlassian JIRA (v6.2#6252)