Just adding more information.
The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which ran 
for 10 seconds. From some reason futex takes 65% of the time. 

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.06   11.097387         103    108084     53662 futex
 12.00    2.047692      170641        12         3 restart_syscall
  8.73    1.488824       23263        64           accept
  6.99    1.192192        5624       212           poll
  6.60    1.125829       22517        50           epoll_wait
  0.26    0.045039         506        89           close
  0.19    0.031703         170       187           sendto
  0.04    0.007508         110        68           setsockopt
  0.03    0.005558          27       209           recvfrom
  0.02    0.003000         375         8           sched_yield
  0.02    0.002999         107        28         1 epoll_ctl
  0.01    0.002000         125        16           open
  0.01    0.001999         167        12           getsockname
  0.01    0.001156          36        32           write
  0.01    0.001000         100        10           fstat
  0.01    0.001000          30        33           fcntl
  0.01    0.000999          15        67           dup2
  0.00    0.000488          98         5           rt_sigreturn
  0.00    0.000350           8        46        10 read
  0.00    0.000222           4        51           mprotect
  0.00    0.000167          42         4           openat
  0.00    0.000092           2        52           stat
  0.00    0.000084           2        45           statfs
  0.00    0.000074           4        21           mmap
  0.00    0.000000           0         9           munmap
  0.00    0.000000           0        26           rt_sigprocmask
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0         5           madvise
  0.00    0.000000           0         6           socket
  0.00    0.000000           0         6         4 connect
  0.00    0.000000           0         1           shutdown
  0.00    0.000000           0         3           getsockopt
  0.00    0.000000           0         7           clone
  0.00    0.000000           0         8           getdents
  0.00    0.000000           0         3           getrlimit
  0.00    0.000000           0         6           sysinfo
  0.00    0.000000           0         7           gettid
  0.00    0.000000           0        14           sched_getaffinity
  0.00    0.000000           0         1           epoll_create
  0.00    0.000000           0         7           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   17.057362                109518     53680 total

On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote:

> Hi,
> 
> I have a cluster of 3 DN/RS and another computer hosting NN/Master.
> 
> From some reason, two of the DataNode nodes are showing high load average 
> (~17).
> When using "top" I can see HDFS and HBASE processes are the one using the 
> most of the cpu (95% in top).
> 
> When inspecting both HDFS and HBASE through JVisualVM on the problematic 
> nodes, I can clearly see that the cpu usage is high.
> 
> Any ideas why its happening on those two nodes (and why the 3rd is resting 
> happily)?
> 
> All three computers have roughly the same hardware.
> The Cluster (both HBASE and HDFS) are not used currently (during my 
> inspection).
> 
> Both HDFS and HBASE logs don't show any particular activity.
> 
> 
> Any leads on where should I look for more would be appreciated.
> 
> 
> Thanks!
> 
> Asaf
> 

Reply via email to