When I do a consistency check, I get: Chain of regions in table urlhashv4 is broken; edges does not contain 7BB16418308C2CB6B8AE56982781A5C6 Table urlhashv4 is inconsistent.
This is the same thing I saw before. Is there anyway of creating an empty region that covers the range of keys that its missing? If I could do that, I could go on. The data is not super-critical, it can be regenerated. Why do these regions just dissappear like this? They are not in the hdfs directory for the table at all. -----Original Message----- From: Robert Gonzalez [mailto:[email protected]] Sent: Wednesday, June 22, 2011 12:48 PM To: '[email protected]' Subject: RE: recovery from regionserver death Hbase: 0.90.0 Hadoop: 0.20.2+320 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel Cryans Sent: Wednesday, June 22, 2011 12:46 PM To: [email protected] Subject: Re: recovery from regionserver death Hadoop and HBase versions please. (no you shouldn't have to do anything special) J-D On Wed, Jun 22, 2011 at 9:44 AM, Robert Gonzalez <[email protected]> wrote: > How does one recover when a regionserver dies? We have this problem > periodically and we basically have to restart hbase or all our jobs die with > these type of errors: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact > region server c1-s35.blablabla.com:60020 for region > urlhashv4,F4657B47F9881A42AF88864EC5EA9B27,1307217134729.4fa3defeeaeb59dc56f7ce6f155b2a0b., > row 'F471203BA4FF5DD2BD2549308FD81F4A', but failed after 10 attempts. > Exceptions: > > > Then eventually this results in a general failure with Wrong Region > exceptions and the whole table seems to go corrupt. The errors one sees at > the regionserver level are: > > 2011-06-22 10:32:35,559 WARN org.apache.hadoop.hbase.regionserver.HRegion: > File > hdfs://c1-m01:54310/hbase/urlhashv4/d3c3f27ac1ce7a2dff35ddf367fe779d/recovered.edits/0000000000097403816 > is zero-length, deleting. > 2011-06-22 10:32:35,563 ERROR > org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of > hdfs://c1-m01:54310/hbase/urlhashv4/d3c3f27ac1ce7a2dff35ddf367fe779d/r > ecovered.edits/0000000000097403816 > 2011-06-22 10:33:19,769 WARN org.apache.hadoop.hbase.regionserver.HRegion: > File > hdfs://c1-m01:54310/hbase/urlhashv4/9d0d6214bebdefd5466d0e6918c3630c/recovered.edits/0000000000097403669 > is zero-length, deleting. > 2011-06-22 10:33:19,770 ERROR > org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of > hdfs://c1-m01:54310/hbase/urlhashv4/9d0d6214bebdefd5466d0e6918c3630c/r > ecovered.edits/0000000000097403669 > > > Shouldn't the master detect deaths and rebalance the regions to other > regionservers? Or is there a manual way to do this without having to restart > the whole thing? > > Thanks, > > Robert Gonzalez > Maxpoint Interactive > > > >
