[ https://issues.apache.org/jira/browse/YARN-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149194#comment-14149194 ]
Hudson commented on YARN-2608: ------------------------------ SUCCESS: Integrated in Hadoop-Hdfs-trunk #1883 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1883/]) YARN-2608. FairScheduler: Potential deadlocks in loading alloc files and clock access. (Wei Yan via kasha) (kasha: rev f4357240a6f81065d91d5f443ed8fc8cd2a14a8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt > FairScheduler: Potential deadlocks in loading alloc files and clock access > -------------------------------------------------------------------------- > > Key: YARN-2608 > URL: https://issues.apache.org/jira/browse/YARN-2608 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wei Yan > Assignee: Wei Yan > Fix For: 2.6.0 > > Attachments: YARN-2608-1.patch, YARN-2608-2.patch, YARN-2608-3.patch > > > Two potential deadlocks exist inside the FairScheduler. > 1. AllocationFileLoaderService would reload the queue configuration, which > calls FairScheduler.AllocationReloadListener.onReload() function. And require > *FairScheduler's lock*; > {code} > public void onReload(AllocationConfiguration queueInfo) { > synchronized (FairScheduler.this) { > .... > } > } > {code} > after that, it would require the *QueueManager's queues lock*. > {code} > private FSQueue getQueue(String name, boolean create, FSQueueType > queueType) { > name = ensureRootPrefix(name); > synchronized (queues) { > .... > } > } > {code} > Another thread FairScheduler.assignToQueue may also need to create a new > queue when a new job submitted. This thread would hold the *QueueManager's > queues lock* firstly, and then would like to hold the *FairScheduler's lock* > as it needs to call FairScheduler.getClock() function when creating a new > FSLeafQueue. Deadlock may happen here. > 2. The AllocationFileLoaderService holds *AllocationFileLoaderService's > lock* first, and then waits for *FairScheduler's lock*. Another thread (like > AdminService.refreshQueues) may call FairScheduler's reinitialize function, > which holds *FairScheduler's lock* first, and then waits for > *AllocationFileLoaderService's lock*. Deadlock may happen here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)