FYI, the 0.20 append branch has now been created. Patches will be trickling in over the next week.
http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/ or the actual svn repo: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/ The list of jiras for this branch: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&&pid=12310942&fixfor=12315103&resolution=-1&sorter/field=priority&sorter/order=DESC JG > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > Sent: Wednesday, June 02, 2010 8:26 AM > To: [email protected] > Subject: Re: Missing Split (full message) > > On Wed, Jun 2, 2010 at 2:13 AM, Dan Harvey <[email protected]> > wrote: > > Yes we're running Cloudera CDH2 which I've just checked includes a > > back ported hdfs-630 patch. > > > > I guess a lot of these issues will be gone once hadoop 0.21 is out > and > > hbase can take advantage of the new features. > > > > Thats the hope. A bunch of fixes have gone in for hbase provoked hdfs > issues in 0.20. Look out for the append branch in hdfs 0.20 coming > soon (It'll be here: > http://svn.apache.org/viewvc/hadoop/common/branches/). It'll be a > 0.20 branch with support for append (hdfs-200, hdfs-142, etc.) and > other fixes needed by hbase. Thats what the next major hbase will > ship against (CDH3 will include this stuff and then some, if I > understand Todd+crew's plans correctly). > > Good on you Dan, > St.Ack > > > > Thanks, > > > > On 2 June 2010 01:10, Stack <[email protected]> wrote: > >> Hey Dan: > >> > >> On Tue, Jun 1, 2010 at 2:57 AM, Dan Harvey <[email protected]> > wrote: > >>> In what cases would a datanode failure (for example running out of > >>> memory in ourcase) cause HBase data loss? > >> > >> We should just move past the damaged DN on to the other replicas but > >> there are probably places where we can get hungup. Out of interest > >> are you running with hdfs-630 inplace? > >> > >>> Would it mostly only causes dataloss to the meta regions or does it > >>> also cause problems with the actual region files? > >>> > >> > >> HDFS files that had their blocks located on the damaged DN would be > >> susceptible (meta files are just like any other). > >> > >> St.Ack > >> > >>>> On Mon, May 24, 2010 at 2:39 PM, Dan Harvey > <[email protected]> wrote: > >>>>> Hi, > >>>>> > >>>>> Sorry for the multiple e-mails, it seems gmail didn't send my > whole > >>>>> message last time! Anyway here it goes again... > >>>>> > >>>>> Whilst loading data via a mapreduce job into HBase I have started > getting > >>>>> this error :- > >>>>> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying > to > >>>>> contact region server Some server, retryOnlyOne=true, index=0, > >>>>> islastrow=false, tries=9, numtries=10, i=0, listsize=19, > >>>>> region=source_documents,ipubmed\x219915054,1274525958679 for > region > >>>>> source_documents,ipubmed\x219915054,1274525958679, row > 'u1012913162', > >>>>> but failed after 10 attempts. > >>>>> Exceptions: > >>>>> at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr > ocess(HConnectionManager.java:1166) > >>>>> at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB > atchOfRows(HConnectionManager.java:1247) > >>>>> at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609) > >>>>> > >>>>> In the master there are the following three regions :- > >>>>> > >>>>> source_documents,ipubmed\x219859228,1274701893687 hadoop1 > >>>>> 1825870642 ipubmed\x219859228 ipubmed\x219915054 > >>>>> source_documents,ipubmed\x219915054,1274525958679 hadoop4 > >>>>> 193393334 ipubmed\x219915054 u102193588 > >>>>> source_documents,u102193588,1274486550122 > hadoop4 > >>>>> 2141795358 u102193588 u105043522 > >>>>> > >>>>> and on one of our 5 nodes I found a region which start with > >>>>> > >>>>> ipubmed\x219915054 and ends with u102002564 > >>>>> > >>>>> and on another I found the other half of the split which starts > with > >>>>> > >>>>> u102002564 and ends with u102193588 > >>>>> > >>>>> So it seems that the middle region on the master was split apart > but > >>>>> that failed to reach the master. > >>>>> > >>>>> We've had a few problems over the last few days with hdfs nodes > >>>>> failing due to lack of memory which has now been fixed but could > have > >>>>> been a cause of this problem. > >>>>> > >>>>> What ways can a split fail to be received by the master and how > long > >>>>> would it take for hbase to fix this? I've read it periodically > will > >>>>> scan the META table to find problems like this but didn't say how > >>>>> often? It has been about 12h here and our cluster didn't appear > to > >>>>> have fixed this missing split, is there a way to force the master > to > >>>>> rescan the META table? Will it fix problems like this given time? > >>>>> > >>>>> Thanks, > >>>>> > >>>>> -- > >>>>> Dan Harvey | Datamining Engineer > >>>>> www.mendeley.com/profiles/dan-harvey > >>>>> > >>>>> Mendeley Limited | London, UK | www.mendeley.com > >>>>> Registered in England and Wales | Company Number 6419015 > >>>>> > >>>> > >>> > >>> -- > >>> Dan Harvey | Datamining Engineer > >>> www.mendeley.com/profiles/dan-harvey > >>> > >>> Mendeley Limited | London, UK | www.mendeley.com > >>> Registered in England and Wales | Company Number 6419015 > >>> > >> > > > > -- > > Dan Harvey | Datamining Engineer > > www.mendeley.com/profiles/dan-harvey > > > > Mendeley Limited | London, UK | www.mendeley.com > > Registered in England and Wales | Company Number 6419015 > >
