Re: Missing Split (full message)

Stack Wed, 02 Jun 2010 08:26:18 -0700

On Wed, Jun 2, 2010 at 2:13 AM, Dan Harvey <[email protected]> wrote:
> Yes we're running Cloudera CDH2 which I've just checked includes a
> back ported hdfs-630 patch.
>
> I guess a lot of these issues will be gone once hadoop 0.21 is out and
> hbase can take advantage of the new features.
>


Thats the hope.  A bunch of fixes have gone in for hbase provoked hdfs
issues in 0.20.  Look out for the append branch in hdfs 0.20 coming
soon (It'll be here:
http://svn.apache.org/viewvc/hadoop/common/branches/).  It'll be a
0.20 branch with support for append (hdfs-200, hdfs-142, etc.) and
other fixes needed by hbase.  Thats what the next major hbase will
ship against (CDH3 will include this stuff and then some, if I
understand Todd+crew's plans correctly).

Good on you Dan,
St.Ack


> Thanks,
>
> On 2 June 2010 01:10, Stack <[email protected]> wrote:
>> Hey Dan:
>>
>> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <[email protected]> wrote:
>>> In what cases would a datanode failure (for example running out of
>>> memory in ourcase) cause HBase data loss?
>>
>> We should just move past the damaged DN on to the other replicas but
>> there are probably places where we can get hungup.  Out of interest
>> are you running with hdfs-630 inplace?
>>
>>> Would it mostly only causes dataloss to the meta regions or does it
>>> also cause problems with the actual region files?
>>>
>>
>> HDFS files that had their blocks located on the damaged DN would be
>> susceptible (meta files are just like any other).
>>
>> St.Ack
>>
>>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey <[email protected]> 
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> Sorry for the multiple e-mails, it seems gmail didn't send my whole
>>>>> message last time! Anyway here it goes again...
>>>>>
>>>>> Whilst loading data via a mapreduce job into HBase I have started getting
>>>>> this error :-
>>>>>
>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>>>> contact region server Some server, retryOnlyOne=true, index=0,
>>>>> islastrow=false, tries=9, numtries=10, i=0, listsize=19,
>>>>> region=source_documents,ipubmed\x219915054,1274525958679 for region
>>>>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162',
>>>>> but failed after 10 attempts.
>>>>> Exceptions:
>>>>> at 
>>>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166)
>>>>> at 
>>>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>>>
>>>>> In the master there are the following three regions :-
>>>>>
>>>>> source_documents,ipubmed\x219859228,1274701893687       hadoop1
>>>>> 1825870642      ipubmed\x219859228      ipubmed\x219915054
>>>>> source_documents,ipubmed\x219915054,1274525958679       hadoop4
>>>>> 193393334        ipubmed\x219915054      u102193588
>>>>> source_documents,u102193588,1274486550122                    hadoop4
>>>>> 2141795358      u102193588                    u105043522
>>>>>
>>>>> and on one of our 5 nodes I found a region which start with
>>>>>
>>>>> ipubmed\x219915054 and ends with u102002564
>>>>>
>>>>> and on another I found the other half of the split which starts with
>>>>>
>>>>> u102002564 and ends with u102193588
>>>>>
>>>>> So it seems that the middle region on the master was split apart but
>>>>> that failed to reach the master.
>>>>>
>>>>> We've had a few problems over the last few days with hdfs nodes
>>>>> failing due to lack of memory which has now been fixed but could have
>>>>> been a cause of this problem.
>>>>>
>>>>> What ways can a split fail to be received by the master and how long
>>>>> would it take for hbase to fix this? I've read it periodically will
>>>>> scan the META table to find problems like this but didn't say how
>>>>> often? It has been about 12h here and our cluster didn't appear to
>>>>> have fixed this missing split, is there a way to force the master to
>>>>> rescan the META table? Will it fix problems like this given time?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Dan Harvey | Datamining Engineer
>>>>> www.mendeley.com/profiles/dan-harvey
>>>>>
>>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>>> Registered in England and Wales | Company Number 6419015
>>>>>
>>>>
>>>
>>> --
>>> Dan Harvey | Datamining Engineer
>>> www.mendeley.com/profiles/dan-harvey
>>>
>>> Mendeley Limited | London, UK | www.mendeley.com
>>> Registered in England and Wales | Company Number 6419015
>>>
>>
>
> --
> Dan Harvey | Datamining Engineer
> www.mendeley.com/profiles/dan-harvey
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>

Re: Missing Split (full message)

Reply via email to