Thanks Anoop for replying.. No explicit close op happened on the WAL file (this log was rolled few sec before). As per HDFS log, there is no close call to this WAL file.
Same issue happened again on 19th March, Here WAL was rolled just before the issue happened, 2016-03-19 05:38:07,153 | INFO | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | Rolled WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337083824 with entries=6508, filesize=61.03 MB; new WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337087136 | org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:972) And after some sec during sync op, 2016-03-19 05:38:10,075 | ERROR | sync.1 | Error syncing, request close of wal | org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1346) java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208) at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:545) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342) at java.lang.Thread.run(Thread.java:745) 2016-03-19 05:38:10,076 | INFO | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | Rolled WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337087136 with entries=6383, filesize=61.51 MB; new WAL /hbase/WALs/RS-HOSTNAME,21302,1458301420876/RS-HOSTNAME%2C21302%2C1458301420876.default.1458337090049 | org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:972) 2016-03-19 05:38:10,087 | FATAL | regionserver/RS-HOSTNAME/RS-IP:21302.logRoller | ABORTING region server RS-HOSTNAME,21302,1458301420876: IOE in log roller | org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2055) java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208) at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:545) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342) at java.lang.Thread.run(Thread.java:745) 2016-03-19 05:38:10,088 | FATAL | regionserver/RS-HOSTNAME/RS-IP`:21302.logRoller | RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver, org.apache.hadoop.hbase.JMXListener, org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver] | org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2063) Here also, no error details in DN/NN log. I am still checking this, will update if any findings. Regards, Pankaj -----Original Message----- From: Anoop John [mailto:anoop.hb...@gmail.com] Sent: Wednesday, March 23, 2016 3:50 PM To: user@hbase.apache.org Subject: Re: Region server getting aborted in every one or two days At the same time, any explicit close op happened on the WAL file? Any log rolling? Can u check the logs to know this? May be check HDFS logs to know abt the close calls to WAL file? -Anoop- On Wed, Mar 23, 2016 at 12:10 PM, Pankaj kr <pankaj...@huawei.com> wrote: > Hi, > > In our production environment, RS is getting aborted in every one or two days > with following exception. > > 2016-03-16 13:57:07,975 | FATAL | MemStoreFlusher.0 | ABORTING region > server xyz-vm8,24502,1458034278600: Replay of WAL required. Forcing > server shutdown | > org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer > .java:2055) > org.apache.hadoop.hbase.DroppedSnapshotException: region: > TB_WEBLOGIN_201603,060,1457916997964.06e204d3bc262b72820aa195fec23513. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2423) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2128) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2090) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1983) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1909) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:509) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:470) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:74) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.throwException4Close(DataStreamer.java:208) > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:142) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:635) > at > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:490) > at > org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:190) > at > org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1342) > ... 1 more > > I don't see any error info at HDFS side at that point of time. > Have anyone faced this issue? > > HBase version is 0.98.6. > > Regards, > Pankaj