There is a view child nodes loop code http://paste2.org/zm8GE7xH
------------------ ???????? ------------------ ??????: "????-????";<175998...@qq.com>; ????????: 2015??10??30??(??????) ????10:24 ??????: "user"<user@hbase.apache.org>; ????: ?????? ?????? Hbase cluster is suddenly unable to respond The client code is http://paste2.org/p3BXkKtV Is the client version compatible with it? I see that the client version is hbase0.96.1.1-hadoop2 ------------------ ???????? ------------------ ??????: "Ted Yu";<yuzhih...@gmail.com>; ????????: 2015??10??30??(??????) ????0:08 ??????: "user@hbase.apache.org"<user@hbase.apache.org>; ????: Re: ?????? Hbase cluster is suddenly unable to respond Client side, have they tried increasing direct memory size ? -XX:MaxDirectMemorySize= Do you know how wide the rows returned may get ? Cheers On Thu, Oct 29, 2015 at 9:03 AM, ???? <175998...@qq.com> wrote: > Developers feedback their client has the following error?? > > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969) > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from > tomcat: unexpected exception receiving call responses > > java.lang.OutOfMemoryError: Direct buffer memory > > at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20] > > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95) > ~[na:1.6.0_20] > > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) > ~[na:1.6.0_20] > > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57) > ~[na:1.6.0_20] > > at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20] > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) > ~[na:1.6.0_20] > > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > ~[hadoop-common-2.2.0.jar:na] > > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > ~[hadoop-common-2.2.0.jar:na] > > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > ~[hadoop-common-2.2.0.jar:na] > > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > ~[hadoop-common-2.2.0.jar:na] > > at java.io.FilterInputStream.read(FilterInputStream.java:116) > ~[na:1.6.0_20] > > at java.io.FilterInputStream.read(FilterInputStream.java:116) > ~[na:1.6.0_20] > > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555) > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2] > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > ~[na:1.6.0_20] > > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > ~[na:1.6.0_20] > > at java.io.DataInputStream.read(DataInputStream.java:132) > ~[na:1.6.0_20] > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) > ~[hadoop-common-2.2.0.jar:na] > > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101) > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2] > > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2] > > > > > > > > ------------------ ???????? ------------------ > ??????: "Ted Yu";<yuzhih...@gmail.com>; > ????????: 2015??10??29??(??????) ????10:48 > ??????: "user@hbase.apache.org"<user@hbase.apache.org>; > > ????: Re: ?????? Hbase cluster is suddenly unable to respond > > > > I took a look at the jstack. > The threads involving RpcServer$Connection.readAndProcess() were in > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in > HBASE-11277 > . > > The protobuf exception shown in your earlier email corresponded to the > following in hbase-protocol/src/main/protobuf/Client.proto : > > message GetRequest { > required RegionSpecifier region = 1; > required Get get = 2; > } > > Are all your hbase clients running in the same version ? > > Cheers > > On Thu, Oct 29, 2015 at 7:28 AM, ???? <175998...@qq.com> wrote: > > > the regionserver jstack log is http://paste2.org/yLDJeXgL > > > > > > > > > > ------------------ ???????? ------------------ > > ??????: "????-????";<175998...@qq.com>; > > ????????: 2015??10??29??(??????) ????9:10 > > ??????: "user"<user@hbase.apache.org>; > > > > ????: ?????? ?????? Hbase cluster is suddenly unable to respond > > > > > > > > hi Ted?? > > > > > > Yesterday around 14:40,one of regionservers hang once against.At that > time > > I saw web ui can not open.Hbase cluster is unable to respond.Very > anxious, > > hoping to get help?? > > > > > > jstack log is as follows?? > > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800 > > nid=0x12d3 runnable [0x00007f3bebe58000] > > java.lang.Thread.State: RUNNABLE > > at sun.nio.ch.NativeThread.current(Native Method) > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325) > > - locked <0x00007f3d27360fb0> (a java.lang.Object) > > - locked <0x00007f3d27360f90> (a java.lang.Object) > > at > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402) > > at > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > - locked <0x00007f3c584ce990> (a > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > > > > > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000 > > nid=0x12d2 runnable [0x00007f3bebf59000] > > java.lang.Thread.State: RUNNABLE > > at sun.nio.ch.NativeThread.current(Native Method) > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325) > > - locked <0x00007f3d27360530> (a java.lang.Object) > > - locked <0x00007f3d27360510> (a java.lang.Object) > > at > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402) > > at > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > - locked <0x00007f3c584cf7d8> (a > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > > > > > > > > > > > region server log : > > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region > > > order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0. > > in 45ms, sequenceid=9599960557, compaction requested=true > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020] > > regionserver.HRegionServer: > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; > > request=scanner_id: 16740356019163164014 number_of_rows: 10 > close_scanner: > > false next_call_seq: 0 > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851) > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020] > > regionserver.HRegionServer: > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; > > request=scanner_id: 16740356019163164014 number_of_rows: 10 > close_scanner: > > false next_call_seq: 0 > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851) > > > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020] > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135 > > com.google.protobuf.UninitializedMessageException: Message missing > > required fields: region, get > > at > > > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020] > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135 > > com.google.protobuf.UninitializedMessageException: Message missing > > required fields: region, get > > at > > > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474) > > at > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491) > > at > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > at > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > > > > > > > ------------------ ???????? ------------------ > > ??????: "????-????";<175998...@qq.com>; > > ????????: 2015??10??26??(??????) ????9:28 > > ??????: "user"<user@hbase.apache.org>; > > > > ????: ?????? ?????? Hbase cluster is suddenly unable to respond > > > > > > > > > > > > Thank you very much?? > > > > > > ------------------ ???????? ------------------ > > ??????: "Ted Yu";<yuzhih...@gmail.com>; > > ????????: 2015??10??26??(??????) ????8:28 > > ??????: "user"<user@hbase.apache.org>; > > > > ????: Re: ?????? Hbase cluster is suddenly unable to respond > > > > > > > > The fix from HBASE-11277 may solve your problem - if you collect stack > > trace during the hang, we would have more clue. > > > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15 > > > > Cheers > > > > > On Oct 26, 2015, at 12:42 AM, ???? <175998...@qq.com> wrote: > > > > > > hi,Ted: > > > > > > > > > I use the HBase version is hbase-0.96.0. > > > Around 17:33,other region servers also appeared in this warn log.I > don't > > know if it's normal or not.At that time I saw web ui can not open.I > restart > > the regionserver then hbase back to normal. Is it possible bug > > HBASE-11277?? > > > > > > > > > Regionserver on the log basically almost this warn log > > > mater on the log is as follows?? > > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced > > merged region(s) and 1 unreferenced parent region(s) > > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] > > ipc.RpcServer: (responseTooSlow): > > > {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":" > > 192.168.39.22:60292 > > > ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"} > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] > > ipc.RpcServer: (responseTooSlow): > > > {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":" > > 192.168.39.22:60286 > > > ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"} > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] > > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService > > methodName: ListTableDescriptorsByNamespace size: 48 connection: > > 192.168.39.22:60292: output error > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] > > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService > > methodName: ListTableDescriptorsByNamespace size: 48 connection: > > 192.168.39.22:60286: output error > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a > > ClosedChannelException, this means that the server was processing a > request > > but the client went away. The error message was: null > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------ ???????? ------------------ > > > ??????: "Ted Yu";<yuzhih...@gmail.com>; > > > ????????: 2015??10??23??(??????) ????11:39 > > > ??????: "user@hbase.apache.org"<user@hbase.apache.org>; > > > > > > ????: Re: Hbase cluster is suddenly unable to respond > > > > > > > > > > > > Were other region servers functioning normally around 17:33 ? > > > > > > Which hbase release are you using ? > > > > > > Can you pastebin more of the region server log ? > > > > > > Thanks > > > > > >> On Fri, Oct 23, 2015 at 8:28 AM, ???? <175998...@qq.com> wrote: > > >> > > >> hi??all?? > > >> > > >> > > >> This afternoon,The whole Hbase cluster is suddenly unable to > > respond.after > > >> I restart a regionserver after??the cluster has recovered.I don't know > > the > > >> cause of the trouble.I hope I can get help from you. > > >> > > >> > > >> Regionserver on the log is as follows?? > > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] > wal.FSHLog: > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com > > %2C60020%2C1442810406218.1445580462689 > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/ > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689 > > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020] > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 > > >> java.io.IOException: Connection reset by peer > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197) > > >> at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > > >> at > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403) > > >> at > > >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > >> at > > >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > >> at > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > >> at java.lang.Thread.run(Thread.java:744) > > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020] > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 > > >> java.io.IOException: Connection reset by peer > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197) > > >> at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > > >> at > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403) > > >> at > > >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563) > > >> at > > >> > > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538) > > >> at > > >> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > >> at > > >> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > >> at java.lang.Thread.run(Thread.java:744) > > >