maybe there is some slow query I met the same problem,I found out that I query 100 thousand columns of a row, the hbase had no response and stopped working.
2012/7/13 Esteban Gutierrez <este...@cloudera.com> > Hi Asaf, > > By any chance is this issue has been going on in your boxes for the last > few days? I won't be surprised by so many calls to futex by the JVM itself, > but since you are giving the same symptoms as the leap second issue it > would be good to know what OS are you using, if NTP is/was running or not > and if the boxes have been restarted or not after jul/1. If the leap second > issue is the cause of this, then just running date -s "`date`" as root wil > lower the cpu usage. > > regards, > esteban. > > > -- > Cloudera, Inc. > > > > > On Thu, Jul 12, 2012 at 10:12 AM, Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > > Just adding more information. > > The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which > > ran for 10 seconds. From some reason futex takes 65% of the time. > > > > % time seconds usecs/call calls errors syscall > > ------ ----------- ----------- --------- --------- ---------------- > > 65.06 11.097387 103 108084 53662 futex > > 12.00 2.047692 170641 12 3 restart_syscall > > 8.73 1.488824 23263 64 accept > > 6.99 1.192192 5624 212 poll > > 6.60 1.125829 22517 50 epoll_wait > > 0.26 0.045039 506 89 close > > 0.19 0.031703 170 187 sendto > > 0.04 0.007508 110 68 setsockopt > > 0.03 0.005558 27 209 recvfrom > > 0.02 0.003000 375 8 sched_yield > > 0.02 0.002999 107 28 1 epoll_ctl > > 0.01 0.002000 125 16 open > > 0.01 0.001999 167 12 getsockname > > 0.01 0.001156 36 32 write > > 0.01 0.001000 100 10 fstat > > 0.01 0.001000 30 33 fcntl > > 0.01 0.000999 15 67 dup2 > > 0.00 0.000488 98 5 rt_sigreturn > > 0.00 0.000350 8 46 10 read > > 0.00 0.000222 4 51 mprotect > > 0.00 0.000167 42 4 openat > > 0.00 0.000092 2 52 stat > > 0.00 0.000084 2 45 statfs > > 0.00 0.000074 4 21 mmap > > 0.00 0.000000 0 9 munmap > > 0.00 0.000000 0 26 rt_sigprocmask > > 0.00 0.000000 0 3 ioctl > > 0.00 0.000000 0 1 pipe > > 0.00 0.000000 0 5 madvise > > 0.00 0.000000 0 6 socket > > 0.00 0.000000 0 6 4 connect > > 0.00 0.000000 0 1 shutdown > > 0.00 0.000000 0 3 getsockopt > > 0.00 0.000000 0 7 clone > > 0.00 0.000000 0 8 getdents > > 0.00 0.000000 0 3 getrlimit > > 0.00 0.000000 0 6 sysinfo > > 0.00 0.000000 0 7 gettid > > 0.00 0.000000 0 14 sched_getaffinity > > 0.00 0.000000 0 1 epoll_create > > 0.00 0.000000 0 7 set_robust_list > > ------ ----------- ----------- --------- --------- ---------------- > > 100.00 17.057362 109518 53680 total > > > > On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote: > > > > > Hi, > > > > > > I have a cluster of 3 DN/RS and another computer hosting NN/Master. > > > > > > From some reason, two of the DataNode nodes are showing high load > > average (~17). > > > When using "top" I can see HDFS and HBASE processes are the one using > > the most of the cpu (95% in top). > > > > > > When inspecting both HDFS and HBASE through JVisualVM on the > problematic > > nodes, I can clearly see that the cpu usage is high. > > > > > > Any ideas why its happening on those two nodes (and why the 3rd is > > resting happily)? > > > > > > All three computers have roughly the same hardware. > > > The Cluster (both HBASE and HDFS) are not used currently (during my > > inspection). > > > > > > Both HDFS and HBASE logs don't show any particular activity. > > > > > > > > > Any leads on where should I look for more would be appreciated. > > > > > > > > > Thanks! > > > > > > Asaf > > > > > > > >