Re: Recovering hbase after a failure

Nick Dimiduk Thu, 02 Oct 2014 12:32:40 -0700

Ah yes, of course there is.

On Thu, Oct 2, 2014 at 12:11 PM, Andrew Purtell <[email protected]>
wrote:


> Is there not the WAL to handle a failed flush?
>
>
>
> > On Oct 2, 2014, at 11:39 AM, Nick Dimiduk <[email protected]> wrote:
> >
> > In this case, didn't the RS creating the directories and flushing the
> files
> > prevent data loss? Had the flush aborted due to lack of directories, that
> > flush data would have been lost entirely.
> >
> >> On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <[email protected]>
> wrote:
> >>
> >> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <[email protected]>
> wrote:
> >>
> >>> Also, once the original /hbase got mv'd, a few of the region servers
> did
> >>> some flush's before they aborted.   Those RS's actually created a new
> >>> /hbase, with new table directories, but only containing the data from
> the
> >>> flush.
> >>
> >>
> >> Sounds like we should be creating flush files with createNonRecursive
> (even
> >> though it's deprecated)
> >>
> >>
> >>> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <[email protected]>
> wrote:
> >>>
> >>> FWIW, in case something like this happens to someone else.
> >>>
> >>> To recover this, the first thing I tried was to just mv the /hbase
> >>> directory back.   That doesn’t work.
> >>>
> >>> To get back going had to completely shut down and restart.
> >>>
> >>> Also, once the original /hbase got mv'd, a few of the region servers
> did
> >>> some flush's before they aborted.   Those RS's actually created a new
> >>> /hbase, with new table directories, but only containing the data from
> the
> >>> flush.
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Buckley,Ron
> >>> Sent: Thursday, October 02, 2014 2:09 PM
> >>> To: hbase-user
> >>> Subject: RE: Recovering hbase after a failure
> >>>
> >>> Nick,
> >>>
> >>> Good ideas.    Compared  file and region counts with our DR site.
> >> Things
> >>> looks OK.  Going to run some rowcounter's too.
> >>>
> >>> Feels like we got off easy.
> >>>
> >>> Ron
> >>>
> >>> -----Original Message-----
> >>> From: Nick Dimiduk [mailto:[email protected]]
> >>> Sent: Thursday, October 02, 2014 1:27 PM
> >>> To: hbase-user
> >>> Subject: Re: Recovering hbase after a failure
> >>>
> >>> Hi Ron,
> >>>
> >>> Yikes!
> >>>
> >>> Do you have any basic metrics regarding the amount of data in the
> system
> >>> -- size of store files before the incident, number of records, &c?
> >>>
> >>> You could sift through the HDFS audit log and see if any files that
> were
> >>> there previously have not been restored.
> >>>
> >>> -n
> >>>
> >>>> On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <[email protected]>
> wrote:
> >>>>
> >>>> We just had an event where, on our main hbase instance, the /hbase
> >>>> directory got moved out from under the running system (Human error).
> >>>>
> >>>> HBase was really unhappy about that, but we were able to recover it
> >>>> fairly easily and get back going.
> >>>>
> >>>> As far as I can tell, all the data and tables came back correct. But,
> >>>> I'm pretty concerned that there may be some hidden corruption or data
> >>> loss.
> >>>>
> >>>> 'hbase hbck'  runs clean and there are no new complaints in the logs.
> >>>>
> >>>> Can anyone think of anything else we should look at?
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>   - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
>

Re: Recovering hbase after a failure

Reply via email to