Esteban,

Thanks. No WAL replay errors. Just about all the region servers logged a 
DroppedSnapshotException and then aborted. I think we're good as far as that 
goes.

Ron

-----Original Message-----
From: Esteban Gutierrez [mailto:este...@cloudera.com] 
Sent: Thursday, October 02, 2014 1:26 PM
To: user@hbase.apache.org
Subject: Re: Recovering hbase after a failure

Hi Ron,

Look into dropped snapshot exceptions in the logs and puts or deletes that skip 
the WAL. If everything is good there then clients should have handled the 
unavailability of HBase and there should not be any dataloss from the server 
side. Also double check if after the crash there were not errors replaying the 
WAL.

esteban.




--
Cloudera, Inc.


On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckl...@oclc.org> wrote:

> We just had an event where, on our main hbase instance, the /hbase 
> directory got moved out from under the running system (Human error).
>
> HBase was really unhappy about that, but we were able to recover it 
> fairly easily and get back going.
>
> As far as I can tell, all the data and tables came back correct. But, 
> I'm pretty concerned that there may be some hidden corruption or data loss.
>
> 'hbase hbck'  runs clean and there are no new complaints in the logs.
>
> Can anyone think of anything else we should look at?
>
>
>
>
>

Reply via email to