Sietse T. Au created YARN-1902:
----------------------------------

             Summary: Allocation of too many containers when a second request 
is done with the same resource capability
                 Key: YARN-1902
                 URL: https://issues.apache.org/jira/browse/YARN-1902
             Project: Hadoop YARN
          Issue Type: Bug
          Components: client
    Affects Versions: 2.3.0, 2.2.0
            Reporter: Sietse T. Au


Regarding AMRMClientImpl

Scenario 1:
Given a ContainerRequest x with Resource y, when addContainerRequest is called 
z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Scenario 2:
This behavior does not occur when no containers are started between the 
allocate calls. 

Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
are requested in both scenarios, but that only in the second scenario, the 
correct behavior is observed.

Looking at the implementation I have found that this (z+1) request is caused by 
the structure of the remoteRequestsTable. The consequence of Map<Resource, 
ResourceRequestInfo> is that ResourceRequestInfo does not hold any information 
about whether a request has been sent to the RM yet or not.

There are workarounds for this, such as releasing the excess containers 
received.

The solution implemented is to initialize a new ResourceRequest in 
ResourceRequestInfo when a request has been successfully sent to the RM.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to