[ https://issues.apache.org/jira/browse/HADOOP-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins moved HDFS-3876 to HADOOP-8770: ------------------------------------------- Component/s: (was: name-node) trash Target Version/s: 2.2.0-alpha (was: 2.2.0-alpha) Affects Version/s: (was: 2.2.0-alpha) 2.2.0-alpha Key: HADOOP-8770 (was: HDFS-3876) Project: Hadoop Common (was: Hadoop HDFS) > NN should not RPC to self to find trash defaults (causes deadlock) > ------------------------------------------------------------------ > > Key: HADOOP-8770 > URL: https://issues.apache.org/jira/browse/HADOOP-8770 > Project: Hadoop Common > Issue Type: Bug > Components: trash > Affects Versions: 2.2.0-alpha > Reporter: Todd Lipcon > Assignee: Eli Collins > Priority: Blocker > Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt, > hdfs-3876.txt > > > When transitioning a SBN to active, I ran into the following situation: > - the TrashPolicy first gets loaded by an IPC Server Handler thread. The > {{initialize}} function then tries to make an RPC to the same node to find > out the defaults. > - This is happening inside the NN write lock (since it's part of the active > initialization). Hence, all of the other handler threads are already blocked > waiting to get the NN lock. > - Since no handler threads are free, the RPC blocks forever and the NN never > enters active state. > We need to have a general policy that the NN should never make RPCs to itself > for any reason, due to potential for deadlocks like this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira