ps there are so many "Connection reset by peer", why your client reset the connection? :-)
On Fri, Aug 8, 2014 at 4:56 PM, Qiang Tian <[email protected]> wrote: > good point. that is a big suspect. > > I check your log, ClosedChannelException should be triggered by > call.sendResponseIfReady()(it is the only request in the queue, so handler > send response directly), but at that point the callqueueSize has been > decremented. > > 2014-08-05 00:50:06,727 WARN [RpcServer.handler=57,port=60020] > ipc.RpcServer (RpcServer.java:processResponse(1041)) - > RpcServer.respondercallId: 118504 service: ClientService methodName: Multi > size: 141.9 K connection: 10.248.134.67:55347: output error > 2014-08-05 00:50:06,727 WARN [RpcServer.handler=57,port=60020] > ipc.RpcServer (CallRunner.java:run(135)) - RpcServer.handler=57,port=60020: > caught a ClosedChannelException, this means that the server was processing > a request but the client went away. The error message was: null > > it looks you have got the fix, would you file a jira? > thanks. > > > On Fri, Aug 8, 2014 at 2:41 PM, Walter King <[email protected]> wrote: > >> I've only looked at the code a little, and likely missed something, but >> does this if block decrement the call queue, if the client already closed >> the connection? >> >> >> https://github.com/apache/hbase/blob/07a771866f18e8ec532c14f624fa908815bd88c7/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java#L74 >> >> >> >> On Thu, Aug 7, 2014 at 11:32 PM, Walter King <[email protected]> wrote: >> >> > Yes, sorry, CallQueueTooBigException. but that value never returns to >> > zero, even when number of requests goes to zero. The call queue too big >> > happens if any regionserver is up for a long enough period of time, so I >> > have to periodically restart them. Also at that 15:30 time I wasn't >> > seeing that exception, but it seems like that is one time in which a >> call >> > didnt properly decrement the callqueuesize because it was at zero before >> > and has never hit zero again - today the minimum is even higher. >> > >> > >> > On Thu, Aug 7, 2014 at 9:14 PM, Qiang Tian <[email protected]> wrote: >> > >> >> bq. "Eventually we ran into ipc queue size full messages being >> returned to >> >> clients trying large batch puts, as it approaches a gigabyte." >> >> >> >> Do you mean CallQueueTooBigException? it looks not the queue size, but >> the >> >> data size that client sends..configured by >> >> "hbase.ipc.server.max.callqueue.size". >> >> >> >> I guess when you client got the exception, it closed the exception and >> >> causing other shared connection RPC failed. >> >> >> >> >> >> 2014-08-06 22:27:57,253 WARN [RpcServer.reader=9,port=60020] >> >> ipc.RpcServer >> >> (RpcServer.java:doRead(794)) - RpcServer.listener,port=60020: count of >> >> bytes read: 0 >> >> java.io.IOException: Connection reset by peer >> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197) >> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >> >> at >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2229) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1415) >> >> at >> >> >> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:790) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:581) >> >> at >> >> >> >> >> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:556) >> >> at >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >> at >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >> at java.lang.Thread.run(Thread.java:744) >> >> 2014-08-06 22:27:57,257 WARN [RpcServer.handler=18,port=60020] >> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) - >> >> RpcServer.respondercallId: 84968 service: ClientService methodName: >> Multi >> >> size: 17.7 K connection: 10.248.130.152:49780: output error >> >> 2014-08-06 22:27:57,258 WARN [RpcServer.handler=18,port=60020] >> >> ipc.RpcServer (CallRunner.java:run(135)) - >> >> RpcServer.handler=18,port=60020: >> >> caught a ClosedChannelException, this means that the server was >> processing >> >> a request but the client went away. The error message was: null >> >> 2014-08-06 22:27:57,260 WARN [RpcServer.handler=61,port=60020] >> >> ipc.RpcServer (RpcServer.java:processResponse(1041)) - >> >> RpcServer.respondercallId: 83907 service: ClientService methodName: >> Multi >> >> size: 17.1 K connection: 10.248.1.56:53615: output error >> >> 2014-08-06 22:27:57,263 WARN [RpcServer.handler=61,port=60020] >> >> ipc.RpcServer (CallRunner.java:run(135)) - >> >> RpcServer.handler=61,port=60020: >> >> caught a ClosedChannelException, this means that the server was >> processing >> >> a request but the client went away. The error message was: null >> >> >> >> >> >> >> >> On Fri, Aug 8, 2014 at 2:57 AM, Walter King <[email protected]> wrote: >> >> >> >> > >> >> >> https://gist.github.com/walterking/4c5c6f5e5e4a4946a656#file-gistfile1-txt >> >> > >> >> > >> http://adroll-test-sandbox.s3.amazonaws.com/regionserver.stdout.log.gz >> >> > >> >> > These are logs from that particular server, and the debug dump from >> >> now(no >> >> > restart in between). The times in the graph are pacific, so it >> should >> >> be >> >> > around 2014-08-06 22:25:00. I do see some exceptions around there. >> >> > >> >> >> > >> > >> > >
