JiangHua Zhu created HDFS-16387: ----------------------------------- Summary: [FGL]Access to Create File is more secure Key: HDFS-16387 URL: https://issues.apache.org/jira/browse/HDFS-16387 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Fine-Grained Locking Reporter: JiangHua Zhu
When I introduced this patch, and tried to use NNThroughputBenchmark to verify the create function, for example: ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs hdfs://xxxx -op create -threads 50 -files 2000000 Run multiple times, there may be an error once. I found that sometimes deadlocks occur, such as: Found one Java-level deadlock: ============================= "CacheReplicationMonitor(72357231)": waiting for ownable synchronizer 0x00007f6a74c1aa50, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 49 on 8020" "IPC Server handler 49 on 8020": waiting for ownable synchronizer 0x00007f6a74d14ec8, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 24 on 8020" "IPC Server handler 24 on 8020": waiting for ownable synchronizer 0x00007f69348ba648, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "IPC Server handler 49 on 8020" Java stack information for the threads listed above: =================================================== "CacheReplicationMonitor(72357231)": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x00007f6a74c1aa50> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.doLock(FSNamesystemLock.java:386) at org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeLock(FSNamesystemLock.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:1587) at org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.rescan(CacheReplicationMonitor.java:288) at org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:189) "IPC Server handler 49 on 8020": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x00007f6a74d14ec8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164) at org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343) at org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createMissingDirs(FSDirMkdirOp.java:92) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:372) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687) "IPC Server handler 24 on 8020": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x00007f69348ba648> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hdfs.server.namenode.INodeMap$INodeMapLock.writeChildLock(INodeMap.java:164) at org.apache.hadoop.util.PartitionedGSet.latchWriteLock(PartitionedGSet.java:343) at org.apache.hadoop.hdfs.server.namenode.INodeMap.latchWriteLock(INodeMap.java:331) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.addFile(FSDirWriteFileOp.java:498) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:375) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2346) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2266) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:733) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:501) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:926) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:865) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2687) Found 1 deadlock. I found that in FSDirWriteFileOp#startFile(), INodeMap#latchWriteLock() is used twice, and there is a possibility of deadlock conflict. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org