[ https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632691#comment-13632691 ]
Omkar Vinit Joshi commented on YARN-547: ---------------------------------------- There was a problem at my end while creating a patch. resubmitting the patch. (I took diff against local trunk in which 2 files were already committed.) > Race condition in Public / Private Localizer may result into resource getting > downloaded again > ---------------------------------------------------------------------------------------------- > > Key: YARN-547 > URL: https://issues.apache.org/jira/browse/YARN-547 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, > yarn-547-20130412.patch, yarn-547-20130415.patch, yarn-547-20130416.patch > > > Public Localizer : > At present when multiple containers try to request a localized resource > * If the resource is not present then first it is created and Resource > Localization starts ( LocalizedResource is in DOWNLOADING state) > * Now if in this state multiple ResourceRequestEvents arrive then > ResourceLocalizationEvents are sent for all of them. > Most of the times it is not resulting into a duplicate resource download but > there is a race condition present there. Inside ResourceLocalization (for > public download) all the requests are added to local attempts map. If a new > request comes in then first it is checked in this map before a new download > starts for the same. For the current download the request will be there in > the map. Now if a same resource request comes in then it will rejected (i.e. > resource is getting downloaded already). However if the current download > completes then the request will be removed from this local map. Now after > this removal if the LocalizerRequestEvent comes in then as it is not present > in local map the resource will be downloaded again. > PrivateLocalizer : > Here a different but similar race condition is present. > * Here inside findNextResource method call; each LocalizerRunner tries to > grab a lock on LocalizerResource. If the lock is not acquired then it will > keep trying until the resource state changes to LOCALIZED. This lock will be > released by the LocalizerRunner when download completes. > * Now if another ContainerLocalizer tries to grab the lock on a resource > before LocalizedResource state changes to LOCALIZED then resource will be > downloaded again. > At both the places the root cause of this is that all the threads try to > acquire the lock on resource however current state of the LocalizedResource > is not taken into consideration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira