Hi Kevin, Here is dropbox link to the log file of region server which failed: http://dl.dropbox.com/u/64149128/hbase-hbase-regionserver-ihub-dn-b1.out IMHO, the problem starts from the line #3009 which says: 12/03/30 15:38:32 FATAL regionserver.HRegionServer: ABORTING region server serverName=ihub-dn-b1,60020,1332955859363, load=(requests=0, regions=44, usedHeap=446, maxHeap=1197): Unhandled exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing ihub-dn-b1,60020,1332955859363 as dead server
I have already tested fault tolerance of HBase by manually bringing down a RS while querying a Table and it worked fine and I was expecting the same today(even though the RS went down by itself today) when i was loading the data. But, it didn't work out well. Thanks for your time. Let me know if you need more details. ~Anil On Fri, Mar 30, 2012 at 6:05 PM, Kevin O'dell <kevin.od...@cloudera.com>wrote: > Anil, > > Can you please attach the RS logs from the failure? > > On Fri, Mar 30, 2012 at 7:05 PM, anil gupta <anilg...@buffalo.edu> wrote: > > Hi All, > > > > I am using cdh3u2 and i have 7 worker nodes(VM's spread across two > > machines) which are running Datanode, Tasktracker, and Region Server(1200 > > MB heap size). I was loading data into HBase using Bulk Loader with a > > custom mapper. I was loading around 34 million records and I have loaded > > the same set of data in the same environment many times before without > any > > problem. This time while loading the data, one of the region server(but > the > > DN and TT kept on running on that node ) failed and then after numerous > > failures of map-tasks the loding job failed. Is there any > > setting/configuration which can make Bulk Loading fault-tolerant to > failure > > of region-servers? > > > > -- > > Thanks & Regards, > > Anil Gupta > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera > -- Thanks & Regards, Anil Gupta