When I do a consistency check, I get:

Chain of regions in table urlhashv4 is broken; edges does not contain 
7BB16418308C2CB6B8AE56982781A5C6
Table urlhashv4 is inconsistent.

This is the same thing I saw before.  Is there anyway of creating an empty 
region that covers the range of keys that its missing?  If I could do that, I 
could go on.  The data is not super-critical, it can be regenerated.

Why do these regions just dissappear like this?  They are not in the hdfs 
directory for the table at all.

-----Original Message-----
From: Robert Gonzalez [mailto:[email protected]] 
Sent: Wednesday, June 22, 2011 12:48 PM
To: '[email protected]'
Subject: RE: recovery from regionserver death

Hbase: 0.90.0
Hadoop: 0.20.2+320

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel 
Cryans
Sent: Wednesday, June 22, 2011 12:46 PM
To: [email protected]
Subject: Re: recovery from regionserver death

Hadoop and HBase versions please.

(no you shouldn't have to do anything special)

J-D

On Wed, Jun 22, 2011 at 9:44 AM, Robert Gonzalez 
<[email protected]> wrote:
> How does one recover when a regionserver dies?  We have this problem 
> periodically and we basically have to restart hbase or all our jobs die with 
> these type of errors:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact 
> region server c1-s35.blablabla.com:60020 for region 
> urlhashv4,F4657B47F9881A42AF88864EC5EA9B27,1307217134729.4fa3defeeaeb59dc56f7ce6f155b2a0b.,
>  row 'F471203BA4FF5DD2BD2549308FD81F4A', but failed after 10 attempts.
> Exceptions:
>
>
> Then eventually this results in a general failure with Wrong Region 
> exceptions and the whole table seems to go corrupt.  The errors one sees at 
> the regionserver level are:
>
> 2011-06-22 10:32:35,559 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> File 
> hdfs://c1-m01:54310/hbase/urlhashv4/d3c3f27ac1ce7a2dff35ddf367fe779d/recovered.edits/0000000000097403816
>  is zero-length, deleting.
> 2011-06-22 10:32:35,563 ERROR
> org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of 
> hdfs://c1-m01:54310/hbase/urlhashv4/d3c3f27ac1ce7a2dff35ddf367fe779d/r
> ecovered.edits/0000000000097403816
> 2011-06-22 10:33:19,769 WARN org.apache.hadoop.hbase.regionserver.HRegion: 
> File 
> hdfs://c1-m01:54310/hbase/urlhashv4/9d0d6214bebdefd5466d0e6918c3630c/recovered.edits/0000000000097403669
>  is zero-length, deleting.
> 2011-06-22 10:33:19,770 ERROR
> org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of 
> hdfs://c1-m01:54310/hbase/urlhashv4/9d0d6214bebdefd5466d0e6918c3630c/r
> ecovered.edits/0000000000097403669
>
>
> Shouldn't the master detect deaths and rebalance the regions to other 
> regionservers?  Or is there a manual way to do this without having to restart 
> the whole thing?
>
> Thanks,
>
> Robert Gonzalez
> Maxpoint Interactive
>
>
>
>

Reply via email to