[ https://issues.apache.org/jira/browse/HDFS-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoqiao He moved HADOOP-17237 to HDFS-15553: --------------------------------------------- Component/s: (was: rpc-server) namenode Key: HDFS-15553 (was: HADOOP-17237) Project: Hadoop HDFS (was: Hadoop Common) > Improve NameNode RPC throughput with ReadWriteRpcCallQueue > ----------------------------------------------------------- > > Key: HDFS-15553 > URL: https://issues.apache.org/jira/browse/HDFS-15553 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Wang, Xinglong > Priority: Major > > *Current* > In our production cluster, a typical traffic model is read to write raito is > 10:1 and sometimes the ratios goes to 30:1. > NameNode is using ReEntrantReadWriteLock under the hood of FSNamesystemLock. > Read lock is shared lock while write lock is exclusive lock. > Read RPC and Write RPC comes randomly to namenode. This makes read and write > mixed up. And then only a small fraction of read can really share their read > lock. > Currently we have default callqueue and faircallqueue. And we can > refreshCallQueue on the fly. This opens room to design new call queue. > *Idea* > If we reorder the rpc call in callqueue to group read rpc together and write > rpc together, we will have sort of control to let a batch of read rpc come to > handlers together and possibly share the same read lock. Thus we can reduce > Fragments of read locks. > This will only improve the chance to share the read lock among the batch of > read rpc due to there are some namenode internal write lock is out of call > queue. > Under ReEntrantReadWriteLock, there is a queue to manage threads asking for > locks. We can give an example. > R: stands for read rpc > W: stands for write rpc > e.g > RRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRW > In this case, we need 16 lock timeslice. > optimized > RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWWWWWWWW > In this case, we only need 9 lock timeslice. > *Correctness* > Since the execution order of any 2 concurrent or queued rpc in namenode is > not guaranteed. We can reorder the rpc in callqueue into read group and write > group. And then dequeue from these 2 queues by a designed strategy. let's say > dequeue 100 read and then dequeue 5 write rpc and then dequeue read again and > then write again. > Since FairCallQueue also does rpc call reorder in callqueue, for this part I > think they share the same logic to guarantee rpc result correctness. > *Performance* > In test environment, we can see a 15% - 20% NameNode RPC throughput > improvement comparing with default callqueue. > Test traffic is 30 read:3 write :1 list using NNLoadGeneratorMR > This performance is not a surprise. Due to some write rpc is not managed in > callqueue. We can't do reorder to them by reording calls in callqueue. > But still we can do a fully read write reorder if we redesign > ReEntrantReadWriteLock to achieve this. This will be further step after this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org