Re: hbck -fix

Wayne Sat, 02 Jul 2011 09:56:19 -0700

It just returns a ton of errors (import: command not found). Our cluster is
hosed anyway. I am waiting to get it completely re-installed from scratch.
Hope has long since flown out the window. I just changed my opinion of what
it takes to manage hbase. A Java engineer is required on staff. I also
realized now a backup strategy is more important than for a RDBMS. Having
RF=3 in HDFS offers no insurance against hbase lossing its shirt and having
.META. getting corrupted. I think I just found the achilles heel.



On Sat, Jul 2, 2011 at 12:40 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you tried running check_meta.rb with --fix ?
>
> On Sat, Jul 2, 2011 at 9:19 AM, Wayne <wav...@gmail.com> wrote:
>
> > We are running 0.90.3. We were testing the table export not realizing the
> > data goes to the root drive and not HDFS. The export filled the master's
> > root partition. The logger had issues and HDFS got corrupted
> > ("java.io.IOException:
> > Incorrect data format. logVersion is -18 but writables.length is 0"). We
> > had
> > to run hadoop fsck -move to fix the corrupted hdfs files. Were were able
> to
> > get hdfs running without issues but hbase ended up with the region
> issues.
> >
> > We also had another issue making it worse with Ganglia. We had moved the
> > Ganglia host to the master server and Ganglia took up so many resources
> > that
> > it actually caused timeouts talking to the master and most nodes ended up
> > shutting down. I guess Ganglia is a pig in terms or resources...
> >
> > I just tried to manually edit the .META. table removing the remnants of
> the
> > old table but the shell went haywire on me and turned to control
> > characters..??...I ended up corrupting the whole thing and had to delete
> > all
> > tables...we have just not had a good week.
> >
> > I will add comments to HBASE-3695 in terms of suggestions.
> >
> > Thanks.
> >
> > On Fri, Jul 1, 2011 at 4:55 PM, Stack <st...@duboce.net> wrote:
> >
> > > What version of hbase are you on Wayne?
> > >
> > > On Fri, Jul 1, 2011 at 8:32 AM, Wayne <wav...@gmail.com> wrote:
> > > > I ran the hbck command and found 14 inconsistencies. There were files
> > in
> > > > hdfs not used for region
> > >
> > > These are usually harmless.  Bad accounting on our part.  Need to plug
> > the
> > > hole.
> > >
> > > >, regions with the same start key, a hole in the
> > > > region chain, and a missing start region with an empty key.
> > >
> > > These are pretty serious.
> > >
> > > How'd the master running out of root partition do this?  I'd be
> > > interested to know.
> > >
> > > > We are not in production so we have the luxury to start again, but
> the
> > > > damage to our confidence is severe. Is there work going on to improve
> > > hbck
> > > > -fix to actually be able to resolve these types of issues? Do we need
> > to
> > > > expect to run a production hbase cluster to be able to move around
> and
> > > > rebuild the region definitions and the .META. table by hand? Things
> > just
> > > got
> > > > a lot scarier fast for us, especially since we were hoping to go into
> > > > production next month. Running out of disk space on the master's root
> > > > partition can bring down the entire cluster? This is scary...
> > > >
> > >
> > > Understood.
> > >
> > > St.Ack
> > >
> >
>

Re: hbck -fix

Reply via email to