maybe there is some slow query
I met the same problem,I found out that I query 100 thousand columns of a
row, the hbase had no response and stopped working.

2012/7/13 Esteban Gutierrez <este...@cloudera.com>

> Hi Asaf,
>
> By any chance is this issue has been going on in your boxes for the last
> few days? I won't be surprised by so many calls to futex by the JVM itself,
> but since you are giving the same symptoms as the leap second issue it
> would be good to know what OS are you using, if NTP is/was running or not
> and if the boxes have been restarted or not after jul/1. If the leap second
> issue is the cause of this, then just running date -s "`date`" as root wil
> lower the cpu usage.
>
> regards,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
>
>
> On Thu, Jul 12, 2012 at 10:12 AM, Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
>
> > Just adding more information.
> > The following is a histogram output of 'strace -p <hdfs-pid> -f -C' which
> > ran for 10 seconds. From some reason futex takes 65% of the time.
> >
> > % time     seconds  usecs/call     calls    errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> >  65.06   11.097387         103    108084     53662 futex
> >  12.00    2.047692      170641        12         3 restart_syscall
> >   8.73    1.488824       23263        64           accept
> >   6.99    1.192192        5624       212           poll
> >   6.60    1.125829       22517        50           epoll_wait
> >   0.26    0.045039         506        89           close
> >   0.19    0.031703         170       187           sendto
> >   0.04    0.007508         110        68           setsockopt
> >   0.03    0.005558          27       209           recvfrom
> >   0.02    0.003000         375         8           sched_yield
> >   0.02    0.002999         107        28         1 epoll_ctl
> >   0.01    0.002000         125        16           open
> >   0.01    0.001999         167        12           getsockname
> >   0.01    0.001156          36        32           write
> >   0.01    0.001000         100        10           fstat
> >   0.01    0.001000          30        33           fcntl
> >   0.01    0.000999          15        67           dup2
> >   0.00    0.000488          98         5           rt_sigreturn
> >   0.00    0.000350           8        46        10 read
> >   0.00    0.000222           4        51           mprotect
> >   0.00    0.000167          42         4           openat
> >   0.00    0.000092           2        52           stat
> >   0.00    0.000084           2        45           statfs
> >   0.00    0.000074           4        21           mmap
> >   0.00    0.000000           0         9           munmap
> >   0.00    0.000000           0        26           rt_sigprocmask
> >   0.00    0.000000           0         3           ioctl
> >   0.00    0.000000           0         1           pipe
> >   0.00    0.000000           0         5           madvise
> >   0.00    0.000000           0         6           socket
> >   0.00    0.000000           0         6         4 connect
> >   0.00    0.000000           0         1           shutdown
> >   0.00    0.000000           0         3           getsockopt
> >   0.00    0.000000           0         7           clone
> >   0.00    0.000000           0         8           getdents
> >   0.00    0.000000           0         3           getrlimit
> >   0.00    0.000000           0         6           sysinfo
> >   0.00    0.000000           0         7           gettid
> >   0.00    0.000000           0        14           sched_getaffinity
> >   0.00    0.000000           0         1           epoll_create
> >   0.00    0.000000           0         7           set_robust_list
> > ------ ----------- ----------- --------- --------- ----------------
> > 100.00   17.057362                109518     53680 total
> >
> > On Jul 12, 2012, at 18:09 PM, Asaf Mesika wrote:
> >
> > > Hi,
> > >
> > > I have a cluster of 3 DN/RS and another computer hosting NN/Master.
> > >
> > > From some reason, two of the DataNode nodes are showing high load
> > average (~17).
> > > When using "top" I can see HDFS and HBASE processes are the one using
> > the most of the cpu (95% in top).
> > >
> > > When inspecting both HDFS and HBASE through JVisualVM on the
> problematic
> > nodes, I can clearly see that the cpu usage is high.
> > >
> > > Any ideas why its happening on those two nodes (and why the 3rd is
> > resting happily)?
> > >
> > > All three computers have roughly the same hardware.
> > > The Cluster (both HBASE and HDFS) are not used currently (during my
> > inspection).
> > >
> > > Both HDFS and HBASE logs don't show any particular activity.
> > >
> > >
> > > Any leads on where should I look for more would be appreciated.
> > >
> > >
> > > Thanks!
> > >
> > > Asaf
> > >
> >
> >
>

Reply via email to