Just adding more information. The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which ran for 10 seconds. From some reason futex takes 65% of the time.
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 65.06 11.097387 103 108084 53662 futex 12.00 2.047692 170641 12 3 restart_syscall 8.73 1.488824 23263 64 accept 6.99 1.192192 5624 212 poll 6.60 1.125829 22517 50 epoll_wait 0.26 0.045039 506 89 close 0.19 0.031703 170 187 sendto 0.04 0.007508 110 68 setsockopt 0.03 0.005558 27 209 recvfrom 0.02 0.003000 375 8 sched_yield 0.02 0.002999 107 28 1 epoll_ctl 0.01 0.002000 125 16 open 0.01 0.001999 167 12 getsockname 0.01 0.001156 36 32 write 0.01 0.001000 100 10 fstat 0.01 0.001000 30 33 fcntl 0.01 0.000999 15 67 dup2 0.00 0.000488 98 5 rt_sigreturn 0.00 0.000350 8 46 10 read 0.00 0.000222 4 51 mprotect 0.00 0.000167 42 4 openat 0.00 0.000092 2 52 stat 0.00 0.000084 2 45 statfs 0.00 0.000074 4 21 mmap 0.00 0.000000 0 9 munmap 0.00 0.000000 0 26 rt_sigprocmask 0.00 0.000000 0 3 ioctl 0.00 0.000000 0 1 pipe 0.00 0.000000 0 5 madvise 0.00 0.000000 0 6 socket 0.00 0.000000 0 6 4 connect 0.00 0.000000 0 1 shutdown 0.00 0.000000 0 3 getsockopt 0.00 0.000000 0 7 clone 0.00 0.000000 0 8 getdents 0.00 0.000000 0 3 getrlimit 0.00 0.000000 0 6 sysinfo 0.00 0.000000 0 7 gettid 0.00 0.000000 0 14 sched_getaffinity 0.00 0.000000 0 1 epoll_create 0.00 0.000000 0 7 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 17.057362 109518 53680 total On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote: > Hi, > > I have a cluster of 3 DN/RS and another computer hosting NN/Master. > > From some reason, two of the DataNode nodes are showing high load average > (~17). > When using "top" I can see HDFS and HBASE processes are the one using the > most of the cpu (95% in top). > > When inspecting both HDFS and HBASE through JVisualVM on the problematic > nodes, I can clearly see that the cpu usage is high. > > Any ideas why its happening on those two nodes (and why the 3rd is resting > happily)? > > All three computers have roughly the same hardware. > The Cluster (both HBASE and HDFS) are not used currently (during my > inspection). > > Both HDFS and HBASE logs don't show any particular activity. > > > Any leads on where should I look for more would be appreciated. > > > Thanks! > > Asaf >