0.94 currently doesn't support hadoop 2.0 Can you deploy hadoop 1.1.1 instead ?
Are you using 0.94.5 ? Thanks On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[email protected]> wrote: > Hey guys, > as I sent in an email a long time ago, the RSs in my cluster did not get > along > and crashed 3 times a day. I tried a lot of options we discussed in the > emails, but it not solved the problem. As I used an old version of hadoop I > thought this was the problem. > > So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to hadoop > 2.0.0 > - hbase 0.94 - zookeeper 3.4.5. > > Unfortunately the RSs did not stop crashing, and worst! Now they crash > every > hour and some times when the RS that holds the .ROOT. crashes all cluster > get > stuck in transition and everything stops working. > In this case I need to clean zookeeper znodes, restart the master and the > RSs. > To avoid this case I am running on production with only ONE RS and a > monitoring > script that check every minute, if the RS is ok. If not, restart it. > * This case does not get the cluster stuck. > > This is driving me crazy, but I really cant find a solution for the > cluster. > I tracked all logs from the start time 16:49 from all interesting nodes > (zoo, > namenode, master, rs, dn2, dn9, dn10) and copied here what I think is > usefull. > > There are some strange errors in the DATANODE2, as an error copiyng a block > to itself. > > The gc log points to GC timeout. However it is very weird that the RS spend > so much time in GC while in the other cases it takes 0.001sec. Besides, > the time > spent, is in sys which makes me think that might be a problem in another > place. > > I know that it is a bunch of logs, and that it is very difficult to find > the > problem without much context. But I REALLY need some help. If it is not the > solution, at least what I should read, where I should look, or which cases > I > should monitor. > > Thank you very much, > Pablo Musa >
