Hi Roland

Here are my conf.
SLES11 SP1
hadoop 1.0.4
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

It seems nothing repeats but hadoop version 8)

On 25.05.2013 19:44, Roland von Herget wrote:
Hi Alexey,

I don't know the solution to this problem, but I can second this, I'm seeing nearly the same: My TaskTrackers are flooding the JobTracker with heartbeats, this starts after the first mapred job and can be repaired by restarting the TaskTracker. The TT nodes have high system cpu usage stats, the JT is not suffering from this.

my environment:
debian 6.0.7
hadoop 1.0.4
java version "1.7.0_15"
Java(TM) SE Runtime Environment (build 1.7.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

What's your environment?

--Roland


On Fri, May 24, 2013 at 3:10 PM, Eremikhin Alexey <[email protected] <mailto:[email protected]>> wrote:

    Hi all,
    I have 29 servers hadoop cluster in almost default configuration.
    After installing Hadoop 1.0.4 I've noticed that JT and some TT
    waste CPU.
    I started stracing its behaviour and found that some TT send
    heartbeats in an unlimited ways.
    It means hundreds in a second.

    Daemon restart solves the issue, but even easiest Hive MR returns
    issue back.

    Here is the filtered strace of heartbeating process

    hadoop9.mlan:~$ sudo strace -tt -f -s 10000 -p 6032 2>&1  | grep
    6065 | grep write


    [pid  6065] 13:07:34.801106 write(70,
    
"\0\0\1\30\0:\316N\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
    
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\300\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\30",
    284) = 284
    [pid  6065] 13:07:34.807968 write(70,
    
"\0\0\1\30\0:\316O\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
    
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\312\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\31",
    284 <unfinished ...>
    [pid  6065] 13:07:34.808080 <... write resumed> ) = 284
    [pid  6065] 13:07:34.814473 write(70,
    
"\0\0\1\30\0:\316P\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
    
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\336\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\32",
    284 <unfinished ...>
    [pid  6065] 13:07:34.814595 <... write resumed> ) = 284
    [pid  6065] 13:07:34.820960 write(70,
    
"\0\0\1\30\0:\316Q\0\theartbeat\0\0\0\5\0*org.apache.hadoop.mapred.TaskTrackerStatus\0*org.apache.hadoop.mapred.TaskTrackerStatus.tracker_hadoop9.mlan:localhost/127.0.0.1:52355
    
<http://127.0.0.1:52355>\fhadoop9.mlan\0\0\303\214\0\0\0\0\0\0\0\2\0\0\0\2\213\1\367\373\200\0\214\367\223\220\0\213\1\341p\220\0\214\341\351\200\0\377\377\213\6\243\253\200\0\214q\r\33\336\215$\205\266\4B\16\333n\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\7boolean\0\0\7boolean\0\0\7boolean\1\0\5short\316\33",
    284 <unfinished ...>


    Please help me to stop this storming 8(



Reply via email to