Pranay Singh created HADOOP-15933:
-------------------------------------

             Summary: Need for more stats in DFSClient
                 Key: HADOOP-15933
                 URL: https://issues.apache.org/jira/browse/HADOOP-15933
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Pranay Singh


The usage of HDFS has changed from being used as a map-reduce filesystem, now 
it's becoming more of like a general purpose filesystem. In most of the cases 
there are issues with the Namenode so we have metrics to know the workload or 
stress on Namenode.

However, there is a need to have more statistics collected for different 
operations/RPCs in DFSClient to know which RPC operations are taking longer 
time or to know what is the frequency of the operation.These statistics can be 
exposed to the users of DFS Client and they can periodically log or do some 
sort of flow control if the response is slow. This will also help to isolate 
HDFS issue in a mixed environment where on a node we have HBase and Impala 
running together. We can check the throughput of different operation across 
client and isolate the problem caused because of noisy neighbor or network 
congestion or shared JVM.

We have dealt with several problems from the field for which there is no 
conclusive evidence as to what caused the problem. If we had metrics or stats 
in DFSClient we would be better equipped to solve such complex problems.

List of jiras for reference:
-------------------------
 HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to