Ok, so if not all the data came back then it could be a bug, although it could have already been fixed since we iterate very fast on the 0.89 releases (which are dev preview releases, not meant for production).
When a region server crashes, the master splits all the write-ahead logs and the regions are then distributed to the remaining region servers. It's all automatic. Even if it happened during a major compaction, the original store files aren't deleted until the new store file is created. Did the master encounter any fatal exceptions while splitting the logs? Did you take a look at the log file? Can you figure which rows in .META. are missing (there would be holes)? J-D On Fri, Aug 13, 2010 at 3:18 PM, Jeremy Carroll <[email protected]> wrote: > We are using CDH3 Beta 2. > ________________________________________ > From: [email protected] [[email protected]] On Behalf Of Jean-Daniel Cryans > [[email protected]] > Sent: Friday, August 13, 2010 4:50 PM > To: [email protected] > Subject: Re: HBase recovery > > Which version? Prior to HBase 0.89 + Hadoop 0.20-append (or cdh3), > HBase cannot guarantee durability of the latest inserts (this includes > edits to .META.) > > J-D > > On Fri, Aug 13, 2010 at 2:45 PM, Jeremy Carroll > <[email protected]> wrote: >> During some testing of a small development cluster, one of the RegionServers >> that we employ has an issue with a bad RAM stick. So when it gets into heavy >> RAM operation it likes to crash. Here is my question. We had an issue where >> the RegionServer holding .META. crashed. The entire cluster was unusable as >> it did not reassign .META. to a different region. Also when the server goes >> down, what happens to all the regions that it held? Does it reassign them to >> other region servers? Also what is the correct action for recovery. It >> crashed during a major_compaction so how do I verify that I am not missing >> data? I see that I had 166 regions online on this server before the crash, >> and now after the crash it has 158. What's the correct steps to recover >> HBase after a major crash? >
