(moving to the user mailing list, where it belongs)

My educated guess is that you had a GC pause that lasted for more than a
minute while a file was being written to. Even if the write wasn't
happening, your region server would have committed suicide anyways since it
was probably past it's lease timeout. In order to confirm that, we would
need to see the logs from a few minutes before the exception happened. Also,
please use pastebin.com (or similar) to store the logs.

Regarding the NPE, it's definitely not something that should happen but the
region server was already on it's way out.

J-D

2011/1/31 Zhou Shuaifeng <[email protected]>

> 2011-01-28 17:13:47,633
>
> java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
> *socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected 
> local=/167.6.5.14:*50735*remote=c3s4.site/
> 167.6.5.13:60020]
>
> 2011-01-28 17:13:47,634 INFO 
> org.apache.hadoop.hbase.regionserver.*HRegionServer:
> STOPPED: Failed open of daughter*
>
>
>
> According this “60000 millis timeout”, I checked the log at 17:12:47,
> there is a null point exception and ClosedChannelException, is there any
> relation?:
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@3a3e8cfb) from 167.6.5.17:51035: output
> error
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@7a0a4ce5) from 167.6.5.14:50307: output
> error
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 2 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 4 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,029 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
> Server handler 5 on 60020 caught: java.nio.channels.ClosedChannelException
>
>                 at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>
>                 at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1336)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:724)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:789)
>
>                 at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1080)
>
>
>
> 2011-01-28 17:12:47,097 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 78 byte(s) of data from znode
> /hbase/unassigned/1489005343;
> data=region=ufdr_c,201101258613813763326#174006,1295973611272,
> server=c3s1.site:60000, state=M_ZK_REGION_OFFLINE
>
> 2011-01-28 17:12:47,097 ERROR
> org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
> processing event M_RS_OPEN_REGION
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
>
>                 at
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:672)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:552)
>
>                 at
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:545)
>
>                 at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:296)
>
>                 at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:90)
>
>                 at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>                 at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>                 at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
> Zhou Shuaifeng(Frank)
> HUAWEI TECHNOLOGIES CO.,LTD. [image: huawei_logo]
>
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from
> HUAWEI, which
> is intended only for the person or entity whose address is listed above.
> Any use of the
> information contained herein in any way (including, but not limited to,
> total or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by
> phone or email immediately and delete it!
>
> *发件人**:* Zhou Shuaifeng [mailto:[email protected]]
> *发送时间:* 2011年1月31日 17:44
> *收件人:* [email protected]
> *抄送:* [email protected]; [email protected]
> *主题:* IPC Server Responder out put error causing RegionServer down
>
>
>
> Hi,
>
>
>
> when I update my hbase from 0.20.6 to 0.90, some error occur:
>
> Below is logs on 167.6.5.13, because Meta table is served by this server,
> so other regionservers want to connect this server to getRegionInfo, but got
> some error:
>
>
>
> 2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@15e43aec) from 167.6.5.11:34887: output
> error
>
> 2011-01-28 17:14:18,291 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call getRegionInfo([B@3ad8672e) from 167.6.5.14:50735: output
> error
>
>
>
> This caused the server on 11 and 14  socket timeout and shutdown:
>
>
>
> 2011-01-28 17:13:47,633 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=c3s5.site,60020,1296203449919, load=(requests=0, regions=2710,
> usedHeap=4622, maxHeap=8165): *Failed open of 
> daughter*ufdr_c,201101268613812388592#085317,1296205930314.cd36b19931bd2b2adcca176a88dbd6e5.
>
> java.net.*SocketTimeoutException*: Call to c3s4.site/167.6.5.13:60020failed on
> *socket timeout exception*: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected 
> local=/167.6.5.14:*50735*remote=c3s4.site/
> 167.6.5.13:60020]
>
> 2011-01-28 17:13:47,634 INFO 
> org.apache.hadoop.hbase.regionserver.*HRegionServer:
> STOPPED: Failed open of daughter*
>
> * *
>
> *Now, my question is what**’**s the cause of **“*output error” ?
>
>
>
> Before the “output error” warning, there is an EOFException, is there any
> relationship with this error?
>
>
>
> 2011-01-28 17:13:43,413 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> listener on 60020: readAndProcess threw exception java.io.EOFException.
> Count of bytes read: 0
>
> java.io.EOFException
>
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>
>        at java.io.DataInputStream.readUTF(DataInputStream.java:592)
>
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:122)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:975)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:943)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:519)
>
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:313)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>        at java.lang.Thread.run(Thread.java:662)
>
> 2011-01-28 17:13:43,413 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
> region: ufdr_c,201101258613814756901#232839,1295985296267.2025697489
>
> 2011-01-28 17:13:43,418 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
> /hbase/unassigned/1489005343;
> data=region=ufdr_c,201101258613813763326#174006,1295973611272,
> server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING
>
> 2011-01-28 17:13:43,420 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22dcbbf01b40006 Retrieved 92 byte(s) of data from znode
> /hbase/unassigned/1497320702;
> data=region=ufdr_c,201101268613810041680#085457,1296052066413,
> server=c3s4.site,60020,1296203450194, state=RS_ZK_REGION_OPENING
>
> 2011-01-28 17:13:43,441 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> regionserver:60020-0x22dcbbf01b40006 Received ZooKeeper Event,
> type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/1489005343
>
> 2011-01-28 17:13:43,441 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x22dcbbf01b40006 Successfully transitioned node
> 1489005343 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
>
>
>
>
>
>
>
> Zhou Shuaifeng(Frank)
>
>

Reply via email to