X3 slow down after moving from HBase 0.90.3 to HBase 0.92.1

Vincent Barat Tue, 20 Nov 2012 10:31:39 -0800

Hi,

We have changed some parameters on our 16(!) region servers : 1GBmore -Xmx, more rpc handler (from 10 to 30) longer timeout, butnothing seems to improve the response time:


- Scans with HBase 0.92  are x3 SLOWER than with HBase 0.90.3

- A lot of simultaneous gets lead to a huge slow down of batch put &ramdom read response time


... despite the fact that our RS CPU load is really low (10%)

Note: we have not (yet) activated MSlabs, nor direct read on HDFS.

Any idea please ? I'm really stuck on that issue.

Best regards,

Le 16/11/12 20:55, Vincent Barat a écrit :

Hi,
Right now (and previously with 0.90.3) we were using the defaultvalue (10).
We are trying right now to increase to 30 to see if it is better.

Thanks for your concern

Le 16/11/12 18:13, Ted Yu a écrit :
Vincent:
What's the value for hbase.regionserver.handler.count ?

I assume you keep the same value as that from 0.90.3

Thanks
On Fri, Nov 16, 2012 at 8:14 AM, VincentBarat<vincent.ba...@gmail.com>wrote:
Le 16/11/12 01:56, Stack a écrit :
On Thu, Nov 15, 2012 at 5:21 AM, GuillaumePerrot<gper...@ubikod.com>
wrote:
It happens when several tables are being compacted and/or whenthere is
several scanners running.
It happens for a particular region? Anything you can tellabout theserver looking in your cluster monitoring? Is it running hot?Whatdo the hbase regionserver stats in UI say? Anythinginteresting about
compaction queues or requests?
Hi, thanks for your answser Stack. I will take the lead on thatthread
from now on.
It does not happens on any particular region. Actually, thingsget betternow since compactions have been performed on all tables and havebeen
stopped.
Nevertheless, we face a dramatic decrease of performances(especially on
random gets) of the overall cluster:
Despite the fact we double our number of region servers (from 8to 16) anddespite the fact that these region server CPU load are justabout 10% to30%, performances are really bad : very often an light increaseof requestlead to a clients locked on request, very long response time. Itlooks like
a contention / deadlock somewhere in the HBase client and C code.
If you look at the thread dump all handlers are occupied serving
requests?  These timedout requests couldn't get into the server?
We will investigate on that and report to you.
Before the timeouts, we observe an increasing CPU load on asingle region
server and if we add region servers and wait for rebalancing,we always
have the same region server causing problems like these:
2012-11-14 20:47:08,443 WARNorg.apache.hadoop.ipc.**HBaseServer: IPC
Server Responder, call
multi(org.apache.hadoop.hbase.**client.MultiAction@2c3da1aa), rpc
version=1, client version=29, methodsFingerPrint=54742778 from
<ip>:45334: output error
2012-11-14 20:47:08,443 WARNorg.apache.hadoop.ipc.**HBaseServer: IPC
Server handler 3 on 60020 caught: java.nio.channels.**
ClosedChannelException
at sun.nio.ch.SocketChannelImpl.**ensureWriteOpen(**
SocketChannelImpl.java:133)
atsun.nio.ch.SocketChannelImpl.**write(SocketChannelImpl.java:**324)
at
org.apache.hadoop.hbase.ipc.**HBaseServer.channelWrite(**
HBaseServer.java:1653)
at
org.apache.hadoop.hbase.ipc.**HBaseServer$Responder.
processResponse(HBaseServer.**java:924)
at
org.apache.hadoop.hbase.ipc.**HBaseServer$Responder.
doRespond(HBaseServer.java:**1003)
at
org.apache.hadoop.hbase.ipc.**HBaseServer$Call.**sendResponseIfReady(
HBaseServer.java:409)
at
org.apache.hadoop.hbase.ipc.**HBaseServer$Handler.run(**
HBaseServer.java:1346)
With the same access patterns, we did not have this issue inHBase
0.90.3.
The above is other side of the timeout -- the client is gone.

Can you explain the rising CPU?
No there is no explanation (no high access a a given region forexemple).
But this specific problem has gone when we finished compactions.


     Is it iowait on this box because of
compactions?  Bad disk?  Always same regionserver or issue moves
around?

Sorry for all the questions.  0.92 should be better than 0.90
Our experience is currently the exact opposite : for us, 0.92seems to be
times slower than the 0.90.3.

  generally (0.94 even better still -- can you go there?).
We can go to 0.94 but unfortunately, we CANNOT GO BACK (the sameway wecannot go back to 0.90.3, since there is apparently amodification of the
format of the ROOT table).
The upgrade works, but the downgrade not. And we are afraid ofhaving evenmore "new" problems with 0.94 and be forced to rollback to0.90.3 (with
some days of data loses).

Thanks for your reply we will continue to investigate.



     Interesting
that these issues show up post upgrade. I can't think of areason why
the different versions would bring this on...

St.Ack

X3 slow down after moving from HBase 0.90.3 to HBase 0.92.1

Reply via email to