It just returns a ton of errors (import: command not found). Our cluster is hosed anyway. I am waiting to get it completely re-installed from scratch. Hope has long since flown out the window. I just changed my opinion of what it takes to manage hbase. A Java engineer is required on staff. I also realized now a backup strategy is more important than for a RDBMS. Having RF=3 in HDFS offers no insurance against hbase lossing its shirt and having .META. getting corrupted. I think I just found the achilles heel.
On Sat, Jul 2, 2011 at 12:40 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you tried running check_meta.rb with --fix ? > > On Sat, Jul 2, 2011 at 9:19 AM, Wayne <wav...@gmail.com> wrote: > > > We are running 0.90.3. We were testing the table export not realizing the > > data goes to the root drive and not HDFS. The export filled the master's > > root partition. The logger had issues and HDFS got corrupted > > ("java.io.IOException: > > Incorrect data format. logVersion is -18 but writables.length is 0"). We > > had > > to run hadoop fsck -move to fix the corrupted hdfs files. Were were able > to > > get hdfs running without issues but hbase ended up with the region > issues. > > > > We also had another issue making it worse with Ganglia. We had moved the > > Ganglia host to the master server and Ganglia took up so many resources > > that > > it actually caused timeouts talking to the master and most nodes ended up > > shutting down. I guess Ganglia is a pig in terms or resources... > > > > I just tried to manually edit the .META. table removing the remnants of > the > > old table but the shell went haywire on me and turned to control > > characters..??...I ended up corrupting the whole thing and had to delete > > all > > tables...we have just not had a good week. > > > > I will add comments to HBASE-3695 in terms of suggestions. > > > > Thanks. > > > > On Fri, Jul 1, 2011 at 4:55 PM, Stack <st...@duboce.net> wrote: > > > > > What version of hbase are you on Wayne? > > > > > > On Fri, Jul 1, 2011 at 8:32 AM, Wayne <wav...@gmail.com> wrote: > > > > I ran the hbck command and found 14 inconsistencies. There were files > > in > > > > hdfs not used for region > > > > > > These are usually harmless. Bad accounting on our part. Need to plug > > the > > > hole. > > > > > > >, regions with the same start key, a hole in the > > > > region chain, and a missing start region with an empty key. > > > > > > These are pretty serious. > > > > > > How'd the master running out of root partition do this? I'd be > > > interested to know. > > > > > > > We are not in production so we have the luxury to start again, but > the > > > > damage to our confidence is severe. Is there work going on to improve > > > hbck > > > > -fix to actually be able to resolve these types of issues? Do we need > > to > > > > expect to run a production hbase cluster to be able to move around > and > > > > rebuild the region definitions and the .META. table by hand? Things > > just > > > got > > > > a lot scarier fast for us, especially since we were hoping to go into > > > > production next month. Running out of disk space on the master's root > > > > partition can bring down the entire cluster? This is scary... > > > > > > > > > > Understood. > > > > > > St.Ack > > > > > >