[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116
 ] 

dingshun edited comment on HDFS-16855 at 11/30/22 7:39 AM:
-----------------------------------------------------------

[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String
 bpid, Configuration conf) method  will be called, and a BLOCK_POOl level write 
lock will be added.

In the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList#addBlockPool(final
 String bpid, final Configuration conf), multiple threads will be started to 
initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when 
BlockPoolSlice is initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica(String
 bpid), and call 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap#replicas(String
 bpid, Consumer<Iterator<ReplicaInfo>> consumer), but #replicas add a read lock 
at the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock


was (Author: dingshun):
[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the #addBlockPool(String bpid, 
Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl 
level write lock will be added.

In the #addBlockPool(final String bpid, final Configuration conf) method of 
FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, 
and the value of dfsUsage needs to be obtained when BlockPoolSlice is 
initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of 
FsDatasetImpl, and call #replicas(String bpid, Consumer<Iterator<ReplicaInfo>> 
consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at 
the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock

> Remove the redundant write lock in addBlockPool
> -----------------------------------------------
>
>                 Key: HDFS-16855
>                 URL: https://issues.apache.org/jira/browse/HDFS-16855
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: dingshun
>            Priority: Major
>              Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   <property>
>     <name>fs.getspaceused.classname</name>
>     
> <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed</value>
>   </property> {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
>     throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
>     try {
>       volumes.addBlockPool(bpid, conf);
>     } catch (AddBlockPoolException e) {
>       volumeExceptions.mergeException(e);
>     }
>     volumeMap.initBlockPool(bpid);
>     Set<String> vols = storageMap.keySet();
>     for (String v : vols) {
>       lockManager.addLock(LockLevel.VOLUME, bpid, v);
>     }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer<Iterator<ReplicaInfo>> consumer) {
>   LightWeightResizableGSet<Block, ReplicaInfo> m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
>     m = map.get(bpid);
>     if (m !=null) {
>       m.getIterator(consumer);
>     }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map<FsVolumeSpi, IOException> unhealthyDataDirs =
>       new ConcurrentHashMap<FsVolumeSpi, IOException>();
>   List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
>   for (final FsVolumeImpl v : volumes) {
>     Thread t = new Thread() {
>       public void run() {
>         try (FsVolumeReference ref = v.obtainReference()) {
>           FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>               " on volume " + v + "...");
>           long startTime = Time.monotonicNow();
>           v.addBlockPool(bpid, conf);
>           long timeTaken = Time.monotonicNow() - startTime;
>           FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>               " on " + v + ": " + timeTaken + "ms");
>         } catch (IOException ioe) {
>           FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>               ". Will throw later.", ioe);
>           unhealthyDataDirs.put(v, ioe);
>         }
>       }
>     };
>     blockPoolAddingThreads.add(t);
>     t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
>     try {
>       t.join();
>     } catch (InterruptedException ie) {
>       throw new IOException(ie);
>     }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to