Re: 回复： Hbase cluster is suddenly unable to respond

Ted Yu Sat, 06 Aug 2016 17:39:55 -0700

Can you provide jstack of region server(s) ?

Was there anything interesting in the logs ?


Thanks

BTW 0.98.7 is quite old. Please consider upgrading.

On Fri, Aug 5, 2016 at 11:10 PM, kiran <kiran.sarvabho...@gmail.com> wrote:

> Hbase client and server are in same version 0.98.7. We are having complete
> downtime of about 30min and high cpu usage in the node and network in the
> cluster.
>
> On Sat, Aug 6, 2016 at 11:38 AM, kiran <kiran.sarvabho...@gmail.com>
> wrote:
>
> > We are also facing the same issue. Please tell us what is the solution. I
> > have increased the rpc timeout and caching is reduced but with no effect.
> > We are using hbase 0.98.7. Please suggest a work around as we are facing
> > the issue very frequently now and we are having downtime in production.
> >
> > On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <175998...@qq.com> wrote:
> >
> >> There is a view child nodes loop code  http://paste2.org/zm8GE7xH
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "蒲聪-北京";<175998...@qq.com>;
> >> 发送时间: 2015年10月30日(星期五) 上午10:24
> >> 收件人: "user"<user@hbase.apache.org>;
> >>
> >> 主题: 回复： 回复： Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >>
> >>
> >> The client code is http://paste2.org/p3BXkKtV
> >>
> >>
> >> Is the client version compatible with it?
> >> I see  that the client version is hbase0.96.1.1-hadoop2
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "Ted Yu";<yuzhih...@gmail.com>;
> >> 发送时间: 2015年10月30日(星期五) 凌晨0:08
> >> 收件人: "user@hbase.apache.org"<user@hbase.apache.org>;
> >>
> >> 主题: Re: 回复： Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >> Client side, have they tried increasing direct memory size ?
> >> -XX:MaxDirectMemorySize=
> >>
> >> Do you know how wide the rows returned may get ?
> >>
> >> Cheers
> >>
> >> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <175998...@qq.com> wrote:
> >>
> >> > Developers feedback their client has the following error：
> >> >
> >> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client
> (1904394969)
> >> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> >> > tomcat: unexpected exception receiving call responses
> >> >
> >> > java.lang.OutOfMemoryError: Direct buffer memory
> >> >
> >> >     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
> >> >
> >> >     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(
> >> SocketInputStream.java:57)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithT
> >> imeout.java:142)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:161)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:131)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputSt
> >> ream.read(RpcClient.java:555)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >     at java.io.BufferedInputStream.read1(BufferedInputStream.
> java:256)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.DataInputStream.read(DataInputStream.java:132)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readRespons
> >> e(RpcClient.java:1101)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClie
> >> nt.java:721)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ------------------ 原始邮件 ------------------
> >> > 发件人: "Ted Yu";<yuzhih...@gmail.com>;
> >> > 发送时间: 2015年10月29日(星期四) 晚上10:48
> >> > 收件人: "user@hbase.apache.org"<user@hbase.apache.org>;
> >> >
> >> > 主题: Re: 回复： Hbase cluster is suddenly unable to respond
> >> >
> >> >
> >> >
> >> > I took a look at the jstack.
> >> > The threads involving RpcServer$Connection.readAndProcess() were in
> >> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> >> > HBASE-11277
> >> > .
> >> >
> >> > The protobuf exception shown in your earlier email corresponded to the
> >> > following in hbase-protocol/src/main/protobuf/Client.proto :
> >> >
> >> > message GetRequest {
> >> >   required RegionSpecifier region = 1;
> >> >   required Get get = 2;
> >> > }
> >> >
> >> > Are all your hbase clients running in the same version ?
> >> >
> >> > Cheers
> >> >
> >> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <175998...@qq.com> wrote:
> >> >
> >> > > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<175998...@qq.com>;
> >> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
> >> > > 收件人: "user"<user@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复： 回复： Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > hi Ted：
> >> > >
> >> > >
> >> > > Yesterday around 14:40,one of regionservers hang once against.At
> that
> >> > time
> >> > > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> >> > anxious,
> >> > > hoping to get help！
> >> > >
> >> > >
> >> > > jstack log is as follows：
> >> > > "RpcServer.reader=4,port=60020" daemon prio=10
> tid=0x00007f4466146800
> >> > > nid=0x12d3 runnable [0x00007f3bebe58000]
> >> > >    java.lang.Thread.State: RUNNABLE
> >> > >     at sun.nio.ch.NativeThread.current(Native Method)
> >> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >> > >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >> > >     at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     - locked <0x00007f3c584ce990> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > > "RpcServer.reader=3,port=60020" daemon prio=10
> tid=0x00007f4466145000
> >> > > nid=0x12d2 runnable [0x00007f3bebf59000]
> >> > >    java.lang.Thread.State: RUNNABLE
> >> > >     at sun.nio.ch.NativeThread.current(Native Method)
> >> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >> > >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >> > >     at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     - locked <0x00007f3c584cf7d8> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > region server log :
> >> > > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion:
> >> Finished
> >> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >> > >
> >> > order_history,2801xyz140618175732642$3,1418829598639.afc8534
> >> 71a8cd4184bc9e7be00b8eea0.
> >> > > in 45ms, sequenceid=9599960557, compaction requested=true
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > >
> >> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > >     at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > >     at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<175998...@qq.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
> >> > > 收件人: "user"<user@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复： 回复： Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Thank you very much！
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "Ted Yu";<yuzhih...@gmail.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
> >> > > 收件人: "user"<user@hbase.apache.org>;
> >> > >
> >> > > 主题: Re: 回复： Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > The fix from HBASE-11277 may solve your problem - if you collect
> stack
> >> > > trace during the hang, we would have more clue.
> >> > >
> >> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >> > >
> >> > > Cheers
> >> > >
> >> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <175998...@qq.com> wrote:
> >> > > >
> >> > > > hi,Ted:
> >> > > >
> >> > > >
> >> > > > I use the HBase version is hbase-0.96.0.
> >> > > > Around 17:33,other region servers also appeared in this warn log.I
> >> > don't
> >> > > know if it's normal or not.At that time I saw web ui can not open.I
> >> > restart
> >> > > the regionserver  then hbase back to normal. Is it possible  bug
> >> > > HBASE-11277？
> >> > > >
> >> > > >
> >> > > > Regionserver on the log basically almost  this warn log
> >> > > > mater on the log  is as follows：
> >> > > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> >> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0
> unreferenced
> >> > > merged region(s) and 1 unreferenced parent region(s)
> >> > > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> >> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":70266,"call":"ListTableDescriptorsByName
> >> space(org.apache.hadoop.hbase.protobuf.generated.MasterProto
> >> s$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60292
> >> > >
> >> > ","starttimems":1445593715207,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":130525,"call":"ListTableDescriptorsByNam
> >> espace(org.apache.hadoop.hbase.protobuf.generated.Maste
> >> rProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60286
> >> > >
> >> > ","starttimems":1445593654945,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60292: output error
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60286: output error
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> >> > > ClosedChannelException, this means that the server was processing a
> >> > request
> >> > > but the client went away. The error message was: null
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > ------------------ 原始邮件 ------------------
> >> > > > 发件人: "Ted Yu";<yuzhih...@gmail.com>;
> >> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> >> > > > 收件人: "user@hbase.apache.org"<user@hbase.apache.org>;
> >> > > >
> >> > > > 主题: Re: Hbase cluster is suddenly unable to respond
> >> > > >
> >> > > >
> >> > > >
> >> > > > Were other region servers functioning normally around 17:33 ?
> >> > > >
> >> > > > Which hbase release are you using ?
> >> > > >
> >> > > > Can you pastebin more of the region server log ?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <175998...@qq.com> wrote:
> >> > > >>
> >> > > >> hi，all：
> >> > > >>
> >> > > >>
> >> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
> >> > > respond.after
> >> > > >> I restart a regionserver after，the cluster has recovered.I don't
> >> know
> >> > > the
> >> > > >> cause of the trouble.I hope I can get help from you.
> >> > > >>
> >> > > >>
> >> > > >> Regionserver on the log is as follows：
> >> > > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> >> > wal.FSHLog:
> >> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> >> > > %2C60020%2C1442810406218.1445580462689
> >> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.
> 1445580462689
> >> > > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >>        at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >>        at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >>        at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >>        at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >>        at java.lang.Thread.run(Thread.java:744)
> >> > > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >>        at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >>        at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >>        at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >>        at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >>        at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
> >
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>

Re: 回复： Hbase cluster is suddenly unable to respond

Reply via email to