We had a situation that has our HBase database in a bad state right now.  We 
re-started a number of nodes this afternoon and while HBase did keep running at 
least one of our tables does not seem to be serving all its regions.  What I'm 
seeing in the log is the below java.io.EOFException stacktrace while trying to 
read a file in the recovered.edits directory.  I looked around a bit and it 
seems like this might be related to HBASE-2933 which seems to say that if the 
master dies while trying to split a log it can leave invalid logs in 
recovered.edits.  That seems possible as it's possible that the master was one 
of the nodes that was re-started today.

   My question is, if this is indeed the case is there a safe way to recover 
from this situation where I am getting EOF exceptions applying recover on 
recovered.edits files?  My understanding is the master splits the logs and 
places them in the recovered.edits directory. I am wondering if I remove the 
files under the recovered.edits directory if the master would re-split the log 
file and recover properly or would I have data loss?

   We are currently running the cloudera distribution of HBase 
hbase-0.89.20100924.

   Any insights on the best way to recover would be much appreciated.

22eb51f162.: java.io.EOFException: 
hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417,
 entryStart=4160964, pos=4161536, end=4161536, edit=1306
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
        at java.lang.Thread.run(Unknown Source)


Reply via email to