Hi Roland
Here are my conf.
SLES11 SP1
hadoop 1.0.4
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
It seems nothing repeats but hadoop version 8)
On 25.05.2013 19:44, Roland von Herget wrote:
Hi Alexey,
I don't know the solution to this problem, but I can second this, I'm
seeing nearly the same:
My TaskTrackers are flooding the JobTracker with heartbeats, this
starts after the first mapred job and can be repaired by restarting
the TaskTracker.
The TT nodes have high system cpu usage stats, the JT is not suffering
from this.
my environment:
debian 6.0.7
hadoop 1.0.4
java version "1.7.0_15"
Java(TM) SE Runtime Environment (build 1.7.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
What's your environment?
--Roland
On Fri, May 24, 2013 at 3:10 PM, Eremikhin Alexey
<[email protected] <mailto:[email protected]>> wrote:
Hi all,
I have 29 servers hadoop cluster in almost default configuration.
After installing Hadoop 1.0.4 I've noticed that JT and some TT
waste CPU.
I started stracing its behaviour and found that some TT send
heartbeats in an unlimited ways.
It means hundreds in a second.
Daemon restart solves the issue, but even easiest Hive MR returns
issue back.
Here is the filtered strace of heartbeating process
hadoop9.mlan:~$ sudo strace -tt -f -s 10000 -p 6032 2>&1 | grep
6065 | grep write
[pid 6065] 13:07:34.801106 write(70,
"\0\0\1\30\0:\316N\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\300\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\30",
284) = 284
[pid 6065] 13:07:34.807968 write(70,
"\0\0\1\30\0:\316O\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\312\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\31",
284 <unfinished ...>
[pid 6065] 13:07:34.808080 <... write resumed> ) = 284
[pid 6065] 13:07:34.814473 write(70,
"\0\0\1\30\0:\316P\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\336\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\32",
284 <unfinished ...>
[pid 6065] 13:07:34.814595 <... write resumed> ) = 284
[pid 6065] 13:07:34.820960 write(70,
"\0\0\1\30\0:\316Q\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\336\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\33",
284 <unfinished ...>
Please help me to stop this storming 8(