Le 21/11/12 18:39, Stack a écrit :
So Vincent, the servers are quiet?   Which would match your low CPU
observation.  Clients are unable to send them load for some reason?
How many disks.  What is your block cache hit number (see regionserver
log -- it gets printed every so often .... or in the below I see 99%
so your numbers should be good coming out of the regionserver).
It does not seem to be a load issue : as you say CPU is low and RPC handlers are under used. We got plenty of disk space, and or block cache hit is 99% on all region servers...

Today we tried to remove some region servers (yes: we had only 8 before moving to 0.92, and we added 8 more because we thought it was a performance issue). We now have 12 of them, are actually the perfs are similar (just more CPU load of course, but similar response time).
600 regions is a lot per server.  You should put it on your TODO list
to have less per server -- bigger regions which you can do now you are
on 0.92.
This is definitively in our TODO. Nevertheless, our 8 RS (0.90.3) before the move had more than 1100 regions each! Without any issue. We increased or region size by X4 (now we use default 1GB setting). And we plan to merge some tables.

If you major compact -- do it when site is less heavily loaded -- does
our performance go up.

Are all query types slow or just certain types?
actually thing are ok for a time (say 2 to 4ms response time) then we got "scanner lease" exeptions... We cannot figure out what triggers this exception (we though it was a contention somewhere, or a server slow down, but our last investigation seem to point a bug between server and clients).

Here is a typical set of exceptiojn we have from time to time:

client (a PIG script using HBaseStorage):
----------------------------------

2012-11-21 14:47:29,925 | ERROR | main | Launcher | Backend error message org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '4537659031468873643' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2117)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:84) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:39) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1325) at org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1293) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:133) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142) at org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableRecordReader.nextKeyValue(HBaseTableInputFormat.java:162) at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getNext(HBaseStorage.java:452) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Region server:
-----------

2012-11-21 14:45:55,199 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '4537659031468873643' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2117)
        at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
--
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)

2012-11-21 14:45:57,320 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":63895,"call":"next(4537659031468873643, 512), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"10.124.45.132:19289","starttimems":13535090\
93424,"queuetimems":0,"class":"HRegionServer","responsesize":6,"method":"next"}
2012-11-21 14:45:57,320 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(4537659031468873643, 512), rpc version=1, client version=29, methodsFingerPrint=54742778 from 10.124.45.132:19289: output error 2012-11-21 14:45:57,323 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 14 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653)


I don't understand this strange responseTooSlow / ClosedChannelException thing. If you can help me on what happens here, it could help.

Best regards, and thank you for your concern.

Reply via email to