RE: Recovering hbase after a failure

Buckley,Ron Thu, 02 Oct 2014 11:00:54 -0700

Esteban,

Thanks. No WAL replay errors. Just about all the region servers logged a 
DroppedSnapshotException and then aborted. I think we're good as far as that 
goes.

Ron

-----Original Message-----
From: Esteban Gutierrez [mailto:[email protected]] 
Sent: Thursday, October 02, 2014 1:26 PM
To: [email protected]
Subject: Re: Recovering hbase after a failure

Hi Ron,

Look into dropped snapshot exceptions in the logs and puts or deletes that skip 
the WAL. If everything is good there then clients should have handled the 
unavailability of HBase and there should not be any dataloss from the server 
side. Also double check if after the crash there were not errors replaying the 
WAL.

esteban.

--
Cloudera, Inc.

On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <[email protected]> wrote:

> We just had an event where, on our main hbase instance, the /hbase 
> directory got moved out from under the running system (Human error).
>
> HBase was really unhappy about that, but we were able to recover it 
> fairly easily and get back going.
>
> As far as I can tell, all the data and tables came back correct. But, 
> I'm pretty concerned that there may be some hidden corruption or data loss.
>
> 'hbase hbck'  runs clean and there are no new complaints in the logs.
>
> Can anyone think of anything else we should look at?
>
>
>
>
>

RE: Recovering hbase after a failure

Reply via email to