Hi, I have noticed recently that the datanodes on a test HDFS cluster im running are run permemnantly at 100% cpu usage after a few hours of minimal use and over time switch up to 200% CPU event when there is almost nothing happening on HDFS.
Node 1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15607 hadoop 20 0 1746m 480m 5380 S 102.2 6.1 190:24.45 java ps -ef |grep 15607 hadoop 15607 1 14 May13 ? 03:12:33 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hmaster-001.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode Node 6: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10809 hadoop 20 0 1750m 497m 17m S 200.0 6.3 277:07.34 java ps -ef |grep 10809 hadoop 10809 1 99 09:56 ? 04:38:04 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hadoop-006.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dcom.sun.management.jmxremote.port=1235 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode I though initally it might be GC but monitoring the datanode process via JMX show almost no GC activity < 1% If I restart the datanode process then CPU usage drops back doen tot 1-5% CPU for a few hours then jumps backup to 100% again. The version of Hadoop i'm using is hadoop-2.3.0 and the main use is for HBase which is also not handling much traffic on the test platform. There are 8 nodes in the cluster each with 8GB ram, 4GB assigned to HBase, top shows 1.9GB ram free. Hadoop heap size is set to 1GB with useage varying between 100 and 200MB, VisualVM shows most of the CPU time to be in org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService() - 33.5% org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run() - 23.3% org.apache.hadoop.net.SocketIOWithTimeoujt$SelectorPool.select() - 16.9% org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks() - 12.9% Any ideas why this would be happening, has anyone else seen this behaviour? -Ian Brooks