Re: RegionServers Crashing every hour in production env

Ted Yu Fri, 08 Mar 2013 08:02:13 -0800

0.94 currently doesn't support hadoop 2.0

Can you deploy hadoop 1.1.1 instead ?


Are you using 0.94.5 ?

Thanks

On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa <[email protected]> wrote:

> Hey guys,
> as I sent in an email a long time ago, the RSs in my cluster did not get
> along
> and crashed 3 times a day. I tried a lot of options we discussed in the
> emails, but it not solved the problem. As I used an old version of hadoop I
> thought this was the problem.
>
> So, I upgraded from hadoop 0.20 - hbase 0.90 - zookeeper 3.3.5 to hadoop
> 2.0.0
> - hbase 0.94 - zookeeper 3.4.5.
>
> Unfortunately the RSs did not stop crashing, and worst! Now they crash
> every
> hour and some times when the RS that holds the .ROOT. crashes all cluster
> get
> stuck in transition and everything stops working.
> In this case I need to clean zookeeper znodes, restart the master and the
> RSs.
> To avoid this case I am running on production with only ONE RS and a
> monitoring
> script that check every minute, if the RS is ok. If not, restart it.
> * This case does not get the cluster stuck.
>
> This is driving me crazy, but I really cant find a solution for the
> cluster.
> I tracked all logs from the start time 16:49 from all interesting nodes
> (zoo,
> namenode, master, rs, dn2, dn9, dn10) and copied here what I think is
> usefull.
>
> There are some strange errors in the DATANODE2, as an error copiyng a block
> to itself.
>
> The gc log points to GC timeout. However it is very weird that the RS spend
> so much time in GC while in the other cases it takes 0.001sec. Besides,
> the time
> spent, is in sys which makes me think that might be a problem in another
> place.
>
> I know that it is a bunch of logs, and that it is very difficult to find
> the
> problem without much context. But I REALLY need some help. If it is not the
> solution, at least what I should read, where I should look, or which cases
> I
> should monitor.
>
> Thank you very much,
> Pablo Musa
>

Re: RegionServers Crashing every hour in production env

Reply via email to