Hello all,

In an AWS outtage we lost about a 5th of our regionservers, and about an
8th of our total datanodes.  Despite a replication factor of 3, it appears
we may have lost some data from corrupt HLogs.  Looking at my hmaster I see
messages like this:

12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog
hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874.
Marking as corrupted

We are back to stable operating now, and in trying to research this I found
the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory.  There are 20
files listed there.

What are our options for tracking down and potentially recovering any data
that was lost.  Or how can we even tell what was lost, if any?  Does the
existence of these files pretty much guarantee data lost? There doesn't
seem to be much documentation on this.  From reading it seems like it might
be possible that part of each of these files was recovered.

Any help would be appreciated.

Thanks!

Bryan

Reply via email to