1. The jstack result is attached. 2. These are the two processes: 25050 shayan 20 0 1643M 87872 2512 S 100. 10.1 1434h /usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs -Dhadoo 25094 shayan 20 0 1643M 87872 2512 R 99.0 10.1 1422h /usr/lib/jvm/java-1.6.0//bin/java -Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/shayan/hadoop/logs -Dhadoo
3. There are not that much log activity on that node. Here is the tail: 2014-06-02 18:14:18,552 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@f81c5a2 2014-06-02 23:36:44,764 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0 2014-06-03 00:14:18,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks took 0 msec to generate and 1 msecs for RPC and NN processing 2014-06-03 00:14:18,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@41f149ee 2014-06-03 05:36:44,764 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0 2014-06-03 06:14:18,548 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks took 0 msec to generate and 5 msecs for RPC and NN processing 2014-06-03 06:14:18,548 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@9ebe5fa 2014-06-03 11:36:44,764 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool BP-436147530-10.101.5.171-1394551811526 Total blocks: 13, missing metadata files:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0 2014-06-03 12:14:18,545 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 13 blocks took 0 msec to generate and 2 msecs for RPC and NN processing 2014-06-03 12:14:18,545 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@419662d7 On Tue, Jun 3, 2014 at 12:44 PM, Ian Brooks <i.bro...@sensewhere.com> wrote: > Hi Shayan, > > If you restart one of the datanodes, does that node go back to normal cpu > usage? if so that looks like the same issue im seeing on my nodes, though > mine will go to 200% over time on a 4 cpu host. > > I havent been able to track the cause down yet. Heavy use of HDFS will > cause the node to jump to the 100% sooner and it stays there even when > doing very lilttle. > > > > Ted, logs from one of my nodes to go with Shayan's > > 2014-06-03 17:01:28,755 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2014-06-03 17:06:41,860 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1074074152_333405 file > /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 > for deletion > 2014-06-03 17:06:41,871 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 > file > /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152 > 2014-06-03 17:08:32,843 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command from standby > NN - ignoring command:2 > 2014-06-03 17:13:44,320 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving > BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196 src: / > 10.143.38.100:26618 dest: /10.143.38.100:50010 > 2014-06-03 17:13:44,351 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: > HDFS_WRITE, cliID: > DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, > srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: > BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: > 3600148963947 > 2014-06-03 17:13:44,352 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, > type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2014-06-03 17:13:54,029 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: > HDFS_WRITE, cliID: > DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, > srvID: 62f7174f-a44d-4cc1-b62c-095782a86164, blockid: > BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: > 3600133166499 > 2014-06-03 17:13:54,029 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, > type=LAST_IN_PIPELINE, downstreams=0:[] terminating > 2014-06-03 17:36:17,405 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving > BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203 src: / > 10.143.38.149:48959 dest: /10.143.38.100:50010 > > > top shows > top - 17:41:55 up 18 days, 23:59, 2 users, load average: 1.06, 1.04, 0.93 > Tasks: 139 total, 1 running, 137 sleeping, 1 stopped, 0 zombie > Cpu(s): 16.4%us, 25.9%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 0.0%si, > 1.4%st > Mem: 8059432k total, 5870572k used, 2188860k free, 181076k buffers > Swap: 835576k total, 0k used, 835576k free, 2493828k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 25450 hadoop 20 0 1728m 332m 17m S 95.8 4.2 49:24.47 java > 25147 hbase 20 0 4998m 284m 15m S 13.4 3.6 7:28.30 java > 7212 flume 20 0 4558m 399m 20m S 1.9 5.1 0:25.61 java > > jstack fails, im probably missing something, just not sure what > > sudo jstack -J-d64 -m 25450 > Attaching to process ID 25450, please wait... > Debugger attached successfully. > Server compiler detected. > JVM version is 24.51-b03 > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.tools.jstack.JStack.runJStackTool(JStack.java:136) > at sun.tools.jstack.JStack.main(JStack.java:102) > Caused by: java.lang.RuntimeException: Unable to deduce type of thread > from address 0x00007f0104010800 (expected type JavaThread, CompilerThread, > ServiceThread, JvmtiAgentThread, or SurrogateLockerThread) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162) > at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150) > at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216) > at sun.jvm.hotspot.tools.PStack.run(PStack.java:67) > at sun.jvm.hotspot.tools.PStack.run(PStack.java:54) > at sun.jvm.hotspot.tools.PStack.run(PStack.java:49) > at sun.jvm.hotspot.tools.JStack.run(JStack.java:60) > at sun.jvm.hotspot.tools.Tool.start(Tool.java:221) > at sun.jvm.hotspot.tools.JStack.main(JStack.java:86) > ... 6 more > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for > type of address 0x00007f0104010800 > at > sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62) > at > sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80) > at > sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158) > ... 14 more > > > -Ian Brooks > > > On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote: > > I have a three node HDFS cluster with a name-node. There is absolutely > no > > IO going on this cluster or any jobs running on it and I just use it for > > testing the Disco HDFS integration. > > I noticed that two of the three data-nodes are using 100% CPU. They have > > been running for a long time (2 months) but with no jobs running on them > > and almost no usage: > > > > $ hadoop version > > Hadoop 2.3.0 > > Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123 > > Compiled by jenkins on 2014-02-11T13:40Z > > Compiled with protoc 2.5.0 > > From source with checksum dfe46336fbc6a044bc124392ec06b85 > > > > Is this a known bug? > -- > -Ian Brooks > Senior server administrator - Sensewhere >
jstack
Description: Binary data