Ah yes, of course there is. On Thu, Oct 2, 2014 at 12:11 PM, Andrew Purtell <[email protected]> wrote:
> Is there not the WAL to handle a failed flush? > > > > > On Oct 2, 2014, at 11:39 AM, Nick Dimiduk <[email protected]> wrote: > > > > In this case, didn't the RS creating the directories and flushing the > files > > prevent data loss? Had the flush aborted due to lack of directories, that > > flush data would have been lost entirely. > > > >> On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <[email protected]> > wrote: > >> > >> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <[email protected]> > wrote: > >> > >>> Also, once the original /hbase got mv'd, a few of the region servers > did > >>> some flush's before they aborted. Those RS's actually created a new > >>> /hbase, with new table directories, but only containing the data from > the > >>> flush. > >> > >> > >> Sounds like we should be creating flush files with createNonRecursive > (even > >> though it's deprecated) > >> > >> > >>> On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <[email protected]> > wrote: > >>> > >>> FWIW, in case something like this happens to someone else. > >>> > >>> To recover this, the first thing I tried was to just mv the /hbase > >>> directory back. That doesn’t work. > >>> > >>> To get back going had to completely shut down and restart. > >>> > >>> Also, once the original /hbase got mv'd, a few of the region servers > did > >>> some flush's before they aborted. Those RS's actually created a new > >>> /hbase, with new table directories, but only containing the data from > the > >>> flush. > >>> > >>> > >>> -----Original Message----- > >>> From: Buckley,Ron > >>> Sent: Thursday, October 02, 2014 2:09 PM > >>> To: hbase-user > >>> Subject: RE: Recovering hbase after a failure > >>> > >>> Nick, > >>> > >>> Good ideas. Compared file and region counts with our DR site. > >> Things > >>> looks OK. Going to run some rowcounter's too. > >>> > >>> Feels like we got off easy. > >>> > >>> Ron > >>> > >>> -----Original Message----- > >>> From: Nick Dimiduk [mailto:[email protected]] > >>> Sent: Thursday, October 02, 2014 1:27 PM > >>> To: hbase-user > >>> Subject: Re: Recovering hbase after a failure > >>> > >>> Hi Ron, > >>> > >>> Yikes! > >>> > >>> Do you have any basic metrics regarding the amount of data in the > system > >>> -- size of store files before the incident, number of records, &c? > >>> > >>> You could sift through the HDFS audit log and see if any files that > were > >>> there previously have not been restored. > >>> > >>> -n > >>> > >>>> On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <[email protected]> > wrote: > >>>> > >>>> We just had an event where, on our main hbase instance, the /hbase > >>>> directory got moved out from under the running system (Human error). > >>>> > >>>> HBase was really unhappy about that, but we were able to recover it > >>>> fairly easily and get back going. > >>>> > >>>> As far as I can tell, all the data and tables came back correct. But, > >>>> I'm pretty concerned that there may be some hidden corruption or data > >>> loss. > >>>> > >>>> 'hbase hbck' runs clean and there are no new complaints in the logs. > >>>> > >>>> Can anyone think of anything else we should look at? > >> > >> > >> > >> -- > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> >
