Re: Missing Split (full message)

Stack Tue, 01 Jun 2010 17:11:16 -0700

Hey Dan:

On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <[email protected]> wrote:
> In what cases would a datanode failure (for example running out of
> memory in ourcase) cause HBase data loss?


We should just move past the damaged DN on to the other replicas but
there are probably places where we can get hungup.  Out of interest
are you running with hdfs-630 inplace?

> Would it mostly only causes dataloss to the meta regions or does it
> also cause problems with the actual region files?
>

HDFS files that had their blocks located on the damaged DN would be
susceptible (meta files are just like any other).

St.Ack

>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey <[email protected]> wrote:
>>> Hi,
>>>
>>> Sorry for the multiple e-mails, it seems gmail didn't send my whole
>>> message last time! Anyway here it goes again...
>>>
>>> Whilst loading data via a mapreduce job into HBase I have started getting
>>> this error :-
>>>
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> contact region server Some server, retryOnlyOne=true, index=0,
>>> islastrow=false, tries=9, numtries=10, i=0, listsize=19,
>>> region=source_documents,ipubmed\x219915054,1274525958679 for region
>>> source_documents,ipubmed\x219915054,1274525958679, row 'u1012913162',
>>> but failed after 10 attempts.
>>> Exceptions:
>>> at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1166)
>>> at 
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>
>>> In the master there are the following three regions :-
>>>
>>> source_documents,ipubmed\x219859228,1274701893687       hadoop1
>>> 1825870642      ipubmed\x219859228      ipubmed\x219915054
>>> source_documents,ipubmed\x219915054,1274525958679       hadoop4
>>> 193393334        ipubmed\x219915054      u102193588
>>> source_documents,u102193588,1274486550122                    hadoop4
>>> 2141795358      u102193588                    u105043522
>>>
>>> and on one of our 5 nodes I found a region which start with
>>>
>>> ipubmed\x219915054 and ends with u102002564
>>>
>>> and on another I found the other half of the split which starts with
>>>
>>> u102002564 and ends with u102193588
>>>
>>> So it seems that the middle region on the master was split apart but
>>> that failed to reach the master.
>>>
>>> We've had a few problems over the last few days with hdfs nodes
>>> failing due to lack of memory which has now been fixed but could have
>>> been a cause of this problem.
>>>
>>> What ways can a split fail to be received by the master and how long
>>> would it take for hbase to fix this? I've read it periodically will
>>> scan the META table to find problems like this but didn't say how
>>> often? It has been about 12h here and our cluster didn't appear to
>>> have fixed this missing split, is there a way to force the master to
>>> rescan the META table? Will it fix problems like this given time?
>>>
>>> Thanks,
>>>
>>> --
>>> Dan Harvey | Datamining Engineer
>>> www.mendeley.com/profiles/dan-harvey
>>>
>>> Mendeley Limited | London, UK | www.mendeley.com
>>> Registered in England and Wales | Company Number 6419015
>>>
>>
>
> --
> Dan Harvey | Datamining Engineer
> www.mendeley.com/profiles/dan-harvey
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>

Re: Missing Split (full message)

Reply via email to