RE: Missing Split (full message)

Jonathan Gray Wed, 02 Jun 2010 16:12:19 -0700

FYI, the 0.20 append branch has now been created.  Patches will be trickling in 
over the next week.


http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/

or the actual svn repo:

https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/

The list of jiras for this branch:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&&pid=12310942&fixfor=12315103&resolution=-1&sorter/field=priority&sorter/order=DESC


JG

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Wednesday, June 02, 2010 8:26 AM
> To: [email protected]
> Subject: Re: Missing Split (full message)
> 
> On Wed, Jun 2, 2010 at 2:13 AM, Dan Harvey <[email protected]>
> wrote:
> > Yes we're running Cloudera CDH2 which I've just checked includes a
> > back ported hdfs-630 patch.
> >
> > I guess a lot of these issues will be gone once hadoop 0.21 is out
> and
> > hbase can take advantage of the new features.
> >
> 
> Thats the hope.  A bunch of fixes have gone in for hbase provoked hdfs
> issues in 0.20.  Look out for the append branch in hdfs 0.20 coming
> soon (It'll be here:
> http://svn.apache.org/viewvc/hadoop/common/branches/).  It'll be a
> 0.20 branch with support for append (hdfs-200, hdfs-142, etc.) and
> other fixes needed by hbase.  Thats what the next major hbase will
> ship against (CDH3 will include this stuff and then some, if I
> understand Todd+crew's plans correctly).
> 
> Good on you Dan,
> St.Ack
> 
> 
> > Thanks,
> >
> > On 2 June 2010 01:10, Stack <[email protected]> wrote:
> >> Hey Dan:
> >>
> >> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <[email protected]>
> wrote:
> >>> In what cases would a datanode failure (for example running out of
> >>> memory in ourcase) cause HBase data loss?
> >>
> >> We should just move past the damaged DN on to the other replicas but
> >> there are probably places where we can get hungup.  Out of interest
> >> are you running with hdfs-630 inplace?
> >>
> >>> Would it mostly only causes dataloss to the meta regions or does it
> >>> also cause problems with the actual region files?
> >>>
> >>
> >> HDFS files that had their blocks located on the damaged DN would be
> >> susceptible (meta files are just like any other).
> >>
> >> St.Ack
> >>
> >>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey
> <[email protected]> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Sorry for the multiple e-mails, it seems gmail didn't send my
> whole
> >>>>> message last time! Anyway here it goes again...
> >>>>>
> >>>>> Whilst loading data via a mapreduce job into HBase I have started
> getting
> >>>>> this error :-
> >>>>>
> >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
> to
> >>>>> contact region server Some server, retryOnlyOne=true, index=0,
> >>>>> islastrow=false, tries=9, numtries=10, i=0, listsize=19,
> >>>>> region=source_documents,ipubmed\x219915054,1274525958679 for
> region
> >>>>> source_documents,ipubmed\x219915054,1274525958679, row
> 'u1012913162',
> >>>>> but failed after 10 attempts.
> >>>>> Exceptions:
> >>>>> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr
> ocess(HConnectionManager.java:1166)
> >>>>> at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB
> atchOfRows(HConnectionManager.java:1247)
> >>>>> at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
> >>>>>
> >>>>> In the master there are the following three regions :-
> >>>>>
> >>>>> source_documents,ipubmed\x219859228,1274701893687       hadoop1
> >>>>> 1825870642      ipubmed\x219859228      ipubmed\x219915054
> >>>>> source_documents,ipubmed\x219915054,1274525958679       hadoop4
> >>>>> 193393334        ipubmed\x219915054      u102193588
> >>>>> source_documents,u102193588,1274486550122
>  hadoop4
> >>>>> 2141795358      u102193588                    u105043522
> >>>>>
> >>>>> and on one of our 5 nodes I found a region which start with
> >>>>>
> >>>>> ipubmed\x219915054 and ends with u102002564
> >>>>>
> >>>>> and on another I found the other half of the split which starts
> with
> >>>>>
> >>>>> u102002564 and ends with u102193588
> >>>>>
> >>>>> So it seems that the middle region on the master was split apart
> but
> >>>>> that failed to reach the master.
> >>>>>
> >>>>> We've had a few problems over the last few days with hdfs nodes
> >>>>> failing due to lack of memory which has now been fixed but could
> have
> >>>>> been a cause of this problem.
> >>>>>
> >>>>> What ways can a split fail to be received by the master and how
> long
> >>>>> would it take for hbase to fix this? I've read it periodically
> will
> >>>>> scan the META table to find problems like this but didn't say how
> >>>>> often? It has been about 12h here and our cluster didn't appear
> to
> >>>>> have fixed this missing split, is there a way to force the master
> to
> >>>>> rescan the META table? Will it fix problems like this given time?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> --
> >>>>> Dan Harvey | Datamining Engineer
> >>>>> www.mendeley.com/profiles/dan-harvey
> >>>>>
> >>>>> Mendeley Limited | London, UK | www.mendeley.com
> >>>>> Registered in England and Wales | Company Number 6419015
> >>>>>
> >>>>
> >>>
> >>> --
> >>> Dan Harvey | Datamining Engineer
> >>> www.mendeley.com/profiles/dan-harvey
> >>>
> >>> Mendeley Limited | London, UK | www.mendeley.com
> >>> Registered in England and Wales | Company Number 6419015
> >>>
> >>
> >
> > --
> > Dan Harvey | Datamining Engineer
> > www.mendeley.com/profiles/dan-harvey
> >
> > Mendeley Limited | London, UK | www.mendeley.com
> > Registered in England and Wales | Company Number 6419015
> >

RE: Missing Split (full message)

Reply via email to