Re: Dead Servers

Mark Wed, 24 Aug 2011 18:03:23 -0700

First off, thanks for you response. 26 seconds seems a bit short to timeouts o what are some more reasonable timeouts I should set?


This is probably the root cause since my job was pretty hefty.


Make
 sure you are not CPU starving the RegionServer thread. For example, if
you are running a MapReduce job using 6 CPU-intensive tasks on a machine
 with 4 cores, you are probably starving the RegionServer enough to
create longer garbage collection pauses.



Question about swapping...

Make sure you don't swap, the JVM never behaves well under swapping



Is this as simple setting

sysctl -w vm.swappiness=5

I know its extremely situation dependent but what would be a recommendedmemory allocation to HBase... currently I have it set to 4G?


Thanks again for you help.


On 8/24/11 5:41 PM, Jean-Daniel Cryans wrote:

Are there performance hits for running in
INFO/DEBUG/? What do most people suggest?

DEBUG until you get your HBase config under control

5 of our HBase region servers were killed. First off, when this happens and
there are only 2 servers is there a possibility of data corruption and/or
loss?

No, unless you hit some sort of bug.

Secondly and more importantly, why does this happen and how can I resolve it?

The important line is:

2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 26666ms for sessionid

This indicates that either your ZK server was GCing for 26 seconds or
your region server was. Either way it ended up in:

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired

Which is 13.6.2.7 here:
http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime

J-D

Re: Dead Servers

Reply via email to