hi, Esteban,

thanks for the quick response.

We have a load job running at the time, I believe the task was both high
CPU and IO, and may cause some hotspot. Well, it wasn't my cluster, so I
don't have the first hand info. But from the past experience, we may have
one hot regionServer with quite some small regions on it.

"how many RPC handlers have you configured for the master.."
is it for hbase.regionserver.handler.count? it was set as 30.

I will get a jstack the next time

Demai






On Mon, May 26, 2014 at 10:08 AM, Esteban Gutierrez <este...@cloudera.com>wrote:

> Hello Demai,
>
> Have you seen any thing else going on the master, e.g. high cpu load? Can
> you try to get a jstack of the HBase master next time you experience this
> issue? Also how many regions you have in this cluster and how many RPC
> handlers have you configured for the master?
>
> regards,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
>
> On Mon, May 26, 2014 at 9:41 AM, Demai Ni <nid...@gmail.com> wrote:
>
> > hi,
> >
> > HBase version is 0.96.
> > Occasionally, a large cluster got the time out(stacktrace at the end of
> the
> > email) when access UI or Hbase shell. HBase Master seems working fine as
> a
> > load job was running and eventually completed successfully.
> >
> > I did some google and found a suggestion to increase hbase.rpc.timeout.
> So
> > my question is, besides change hbase.rpc.time to 120 sec or longer, any
> > other possiblity to cause this issue. or any other configuration to tune
> >
> > BTW, I found a good blog here:
> > http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html. Any
> > detailed suggestion of best practice around these configuration? thanks
> >
> > Demai
> >
> >
> > Time out stack  in pastin : http://pastebin.com/edtATYrw
> >
> > HTTP ERROR 500
> > Problem accessing /master-status. Reason:
> >     Call to hdperf001.svl.ibm.com/9.30.75.10:60000 failed because
> > java.net.SocketTimeoutException: 600000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/9.30.75.10:1047 remote=
> > hdperf001.svl.ibm.com/9.30.75.10:60000]
> > Caused by:
> > java.net.SocketTimeoutException: Call to
> > hdperf001.svl.ibm.com/9.30.75.10:60000 failed because
> > java.net.SocketTimeoutException: 600000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/9.30.75.10:1047 remote=
> > hdperf001.svl.ibm.com/9.30.75.10:60000]
> > at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1478)
> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
> >
>

Reply via email to