Data node process consumes 180% cpu ------------------------------------ Key: HADOOP-2144 URL: https://issues.apache.org/jira/browse/HADOOP-2144 Project: Hadoop Issue Type: Improvement Reporter: Runping Qi
I did a test on DFS read throughput and found that the data node process consumes up to 180% cpu when it is under heavi load. Here are the details: The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks. I copied a 10GB file to dfs from one machine with a data node running there. Based on the dfs block placement policy, that machine has one replica for each block of the file. then I run 4 of the following commands in parellel: hadoop dfs -cat thefile > /dev/null & Since all the blocks have a local replica, all the read requests went to the local data node. I observed that: The data node process's cpu usage was around 180% for most of the time . The clients's cpu usage was moderate (as it should be). All the four disks were working concurrently with comparable read throughput. The total read throughput was maxed at 90MB/Sec, about 60% of the expected total aggregated max read throughput of 4 disks (160MB/Sec) The data node's cpu usage seems unreasonably high. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.