Esteban, Thanks. No WAL replay errors. Just about all the region servers logged a DroppedSnapshotException and then aborted. I think we're good as far as that goes.
Ron -----Original Message----- From: Esteban Gutierrez [mailto:este...@cloudera.com] Sent: Thursday, October 02, 2014 1:26 PM To: user@hbase.apache.org Subject: Re: Recovering hbase after a failure Hi Ron, Look into dropped snapshot exceptions in the logs and puts or deletes that skip the WAL. If everything is good there then clients should have handled the unavailability of HBase and there should not be any dataloss from the server side. Also double check if after the crash there were not errors replaying the WAL. esteban. -- Cloudera, Inc. On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckl...@oclc.org> wrote: > We just had an event where, on our main hbase instance, the /hbase > directory got moved out from under the running system (Human error). > > HBase was really unhappy about that, but we were able to recover it > fairly easily and get back going. > > As far as I can tell, all the data and tables came back correct. But, > I'm pretty concerned that there may be some hidden corruption or data loss. > > 'hbase hbck' runs clean and there are no new complaints in the logs. > > Can anyone think of anything else we should look at? > > > > >