[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695706#comment-13695706 ]
Omkar Vinit Joshi commented on YARN-744: ---------------------------------------- The problem here is that we retrieve the last response from resource map and then try to grab a lock on it. However after grabbing lock we don't check if the last response in resource map itself got updated or not. That results into a race condition which I am trying to solve here.. After grabbing the lock an additional check has to be made to ensure that lastResponse was not changed in between i.e. no other AM requests were processed. > Race condition in ApplicationMasterService.allocate .. It might process same > allocate request twice resulting in additional containers getting allocated. > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-744 > URL: https://issues.apache.org/jira/browse/YARN-744 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Omkar Vinit Joshi > Attachments: MAPREDUCE-3899-branch-0.23.patch > > > Looks like the lock taken in this is broken. It takes a lock on lastResponse > object and then puts a new lastResponse object into the map. At this point a > new thread entering this function will get a new lastResponse object and will > be able to take its lock and enter the critical section. Presumably we want > to limit one response per app attempt. So the lock could be taken on the > ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira