[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636976#comment-13636976 ]
Hudson commented on YARN-547: ----------------------------- Integrated in Hadoop-trunk-Commit #3641 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3641/]) YARN-547. Fixed race conditions in public and private resource localization which used to cause duplicate downloads. Contributed by Omkar Vinit Joshi. (Revision 1470076) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1470076 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalizedResource.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > Race condition in Public / Private Localizer may result into resource getting > downloaded again > ---------------------------------------------------------------------------------------------- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > Fix For: 2.0.5-beta > > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, > yarn-547-20130412.patch, yarn-547-20130415.patch, yarn-547-20130416.1.patch, > yarn-547-20130416.patch, yarn-547-20130418.patch > > > Public Localizer : > At present when multiple containers try to request a localized resource > * If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > * Now if in this state multiple ResourceRequestEvents arrive then > ResourceLocalizationEvents are sent for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. Inside ResourceLocalization (for > public download) all the requests are added to local attempts map. If a new > request comes in then first it is checked in this map before a new download > starts for the same. For the current download the request will be there in > the map. Now if a same resource request comes in then it will rejected (i.e. > resource is getting downloaded already). However if the current download > completes then the request will be removed from this local map. Now after > this removal if the LocalizerRequestEvent comes in then as it is not present > in local map the resource will be downloaded again. > PrivateLocalizer : > Here a different but similar race condition is present. > * Here inside findNextResource method call; each LocalizerRunner tries to > grab a lock on LocalizerResource. If the lock is not acquired then it will > keep trying until the resource state changes to LOCALIZED. This lock will be > released by the LocalizerRunner when download completes. > * Now if another ContainerLocalizer tries to grab the lock on a resource > before LocalizedResource state changes to LOCALIZED then resource will be > downloaded again. > At both the places the root cause of this is that all the threads try to > acquire the lock on resource however current state of the LocalizedResource > is not taken into consideration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira