[ https://issues.apache.org/jira/browse/YARN-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated YARN-10562: -------------------------------- Labels: resourcemanager (was: ) > Follow up changes for YARN-9833 > ------------------------------- > > Key: YARN-10562 > URL: https://issues.apache.org/jira/browse/YARN-10562 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn > Affects Versions: 3.4.0 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Major > Labels: resourcemanager > Attachments: YARN-10562.001.patch, YARN-10562.002.patch, > YARN-10562.003.patch, YARN-10562.004.patch > > > In YARN-9833, a race condition in DirectoryCollection. {{getGoodDirs()}} and > related methods were returning an unmodifiable view of the lists. These > accesses were protected by read/write locks, but because the lists are > CopyOnWriteArrayLists, subsequent changes to the list, even when done under > the writelock, were exposed when a caller started iterating the list view. > CopyOnWriteArrayLists cache the current underlying list in the iterator, so > it is safe to iterate them even while they are being changed - at least the > view will be consistent. > The problem was that checkDirs() was clearing the lists and rebuilding them > from scratch every time, so if a caller called getGoodDirs() just before > checkDirs cleared it, and then started iterating right after the clear, they > could get an empty list. > The fix in YARN-9833 was to change {{getGoodDirs()}} and related methods to > return a copy of the list, which definitely fixes the race condition. The > disadvantage is that now we create a new copy of these lists every time we > launch a container. The advantage using CopyOnWriteArrayList was that the > lists should rarely ever change, and we can avoid all the copying. > Unfortunately, the way checkDirs() was written, it guaranteed that it would > modify those lists multiple times every time. > So this Jira proposes an alternate solution for YARN-9833, which mainly just > rewrites checkDirs() to minimize the changes to the underlying lists. There > are still some small windows where a disk will have been added to one list, > but not yet removed from another if you hit it just right, but I think these > should be pretty rare and relatively harmless, and in the vast majority of > cases I suspect only one disk will be moving from one list to another at any > time. The question is whether this type of inconsistency (which was always > there before -YARN-9833- is worth reducing all the copying. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org