bq. "Eventually we ran into ipc queue size full messages being returned to clients trying large batch puts, as it approaches a gigabyte."
Do you mean CallQueueTooBigException? it looks not the queue size, but the data size that client sends..configured by "hbase.ipc.server.max.callqueue.size". I guess when you client got the exception, it closed the exception and causing other shared connection RPC failed. 2014-08-06 22:27:57,253 WARN [RpcServer.reader=9,port=60020] ipc.RpcServer (RpcServer.java:doRead(794)) - RpcServer.listener,port=60020: count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 2014-08-06 22:27:57,257 WARN [RpcServer.handler=18,port=60020] ipc.RpcServer (RpcServer.java:processResponse(1041)) - RpcServer.respondercallId: 84968 service: ClientService methodName: Multi size: 17.7 K connection: 10.248.130.152:49780: output error 2014-08-06 22:27:57,258 WARN [RpcServer.handler=18,port=60020] ipc.RpcServer (CallRunner.java:run(135)) - RpcServer.handler=18,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null 2014-08-06 22:27:57,260 WARN [RpcServer.handler=61,port=60020] ipc.RpcServer (RpcServer.java:processResponse(1041)) - RpcServer.respondercallId: 83907 service: ClientService methodName: Multi size: 17.1 K connection: 10.248.1.56:53615: output error 2014-08-06 22:27:57,263 WARN [RpcServer.handler=61,port=60020] ipc.RpcServer (CallRunner.java:run(135)) - RpcServer.handler=61,port=60020: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null On Fri, Aug 8, 2014 at 2:57 AM, Walter King <[email protected]> wrote: > https://gist.github.com/walterking/4c5c6f5e5e4a4946a656#file-gistfile1-txt > > http://adroll-test-sandbox.s3.amazonaws.com/regionserver.stdout.log.gz > > These are logs from that particular server, and the debug dump from now(no > restart in between). The times in the graph are pacific, so it should be > around 2014-08-06 22:25:00. I do see some exceptions around there. >
