[ 
https://issues.apache.org/jira/browse/HDFS-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130325#comment-14130325
 ] 

Daryn Sharp commented on HDFS-6306:
-----------------------------------

I like both approaches.  I think #1 might be easier.  Rather than re-locking 
for every edit, it would be preferable to batch edits within the lock.  Either 
a fixed number of edits, or better yet as many as possible within a bounded 
amount of time.

> Standby NN can hold FSDirectory's writeLock for a long time under heavy load
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-6306
>                 URL: https://issues.apache.org/jira/browse/HDFS-6306
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>
> Standby NN uses FSEditLogLoader to update its namespace.  It can hold 
> FSDirectory's writeLock for a long time when active NN generates lots of 
> edits.
> {noformat}
> loadEditRecords
>     fsNamesys.writeLock();
>     fsDir.writeLock();
>     ...
>     try {
>       while (true) {
>         try {
>           FSEditLogOp op;
>           try {
>             op = in.readOp();
>         ...
>           }
>        }
>     } finally {
>       ...
>       fsDir.writeUnlock();
>       fsNamesys.writeUnlock();
>     }
> {noformat}
> With the fix in https://issues.apache.org/jira/browse/HDFS-5693, JMX response 
> time is good for active NN as it no longer requires FSnamesystem's lock, even 
> though it still need to acquire FSDirectory's readlock during FSDirectory's 
> totalInodes. That isn't an issue for active NN as each client RPC request 
> might only acquire FSDirectory lock for short period of time. But Standby NN 
> could hold the lock for a longer period of time.
> There are two ways to fix these:
> 1. Fix standby NN to acquire FSDirectory's writeLock for each edit record.
> 2. Fix FSDirectory's totalInodes to not take readLock so JMX can still go 
> through.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to