Re: Recovering hbase after a failure

Nick Dimiduk Thu, 02 Oct 2014 11:40:41 -0700

In this case, didn't the RS creating the directories and flushing the files
prevent data loss? Had the flush aborted due to lack of directories, that
flush data would have been lost entirely.


On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <apurt...@apache.org> wrote:

> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckl...@oclc.org> wrote:
>
> > Also, once the original /hbase got mv'd, a few of the region servers did
> > some flush's before they aborted.   Those RS's actually created a new
> > /hbase, with new table directories, but only containing the data from the
> > flush.
>
>
> Sounds like we should be creating flush files with createNonRecursive (even
> though it's deprecated)
>
>
> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckl...@oclc.org> wrote:
>
> > FWIW, in case something like this happens to someone else.
> >
> > To recover this, the first thing I tried was to just mv the /hbase
> > directory back.   That doesn’t work.
> >
> > To get back going had to completely shut down and restart.
> >
> > Also, once the original /hbase got mv'd, a few of the region servers did
> > some flush's before they aborted.   Those RS's actually created a new
> > /hbase, with new table directories, but only containing the data from the
> > flush.
> >
> >
> > -----Original Message-----
> > From: Buckley,Ron
> > Sent: Thursday, October 02, 2014 2:09 PM
> > To: hbase-user
> > Subject: RE: Recovering hbase after a failure
> >
> > Nick,
> >
> > Good ideas.    Compared  file and region counts with our DR site.
>  Things
> > looks OK.  Going to run some rowcounter's too.
> >
> > Feels like we got off easy.
> >
> > Ron
> >
> > -----Original Message-----
> > From: Nick Dimiduk [mailto:ndimi...@gmail.com]
> > Sent: Thursday, October 02, 2014 1:27 PM
> > To: hbase-user
> > Subject: Re: Recovering hbase after a failure
> >
> > Hi Ron,
> >
> > Yikes!
> >
> > Do you have any basic metrics regarding the amount of data in the system
> > -- size of store files before the incident, number of records, &c?
> >
> > You could sift through the HDFS audit log and see if any files that were
> > there previously have not been restored.
> >
> > -n
> >
> > On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckl...@oclc.org> wrote:
> >
> > > We just had an event where, on our main hbase instance, the /hbase
> > > directory got moved out from under the running system (Human error).
> > >
> > > HBase was really unhappy about that, but we were able to recover it
> > > fairly easily and get back going.
> > >
> > > As far as I can tell, all the data and tables came back correct. But,
> > > I'm pretty concerned that there may be some hidden corruption or data
> > loss.
> > >
> > > 'hbase hbck'  runs clean and there are no new complaints in the logs.
> > >
> > > Can anyone think of anything else we should look at?
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Recovering hbase after a failure

Reply via email to