First off, thanks for you response. 26 seconds seems a bit short to time outs o what are some more reasonable timeouts I should set?

This is probably the root cause since my job was pretty hefty.

Make
 sure you are not CPU starving the RegionServer thread. For example, if
you are running a MapReduce job using 6 CPU-intensive tasks on a machine
 with 4 cores, you are probably starving the RegionServer enough to
create longer garbage collection pauses.



Question about swapping...

Make sure you don't swap, the JVM never behaves well under swapping



Is this as simple setting

sysctl -w vm.swappiness=5


I know its extremely situation dependent but what would be a recommended memory allocation to HBase... currently I have it set to 4G?

Thanks again for you help.


On 8/24/11 5:41 PM, Jean-Daniel Cryans wrote:
Are there performance hits for running in
INFO/DEBUG/? What do most people suggest?
DEBUG until you get your HBase config under control

5 of our HBase region servers were killed. First off, when this happens and
there are only 2 servers is there a possibility of data corruption and/or
loss?
No, unless you hit some sort of bug.

Secondly and more importantly, why does this happen and how can I resolve it?
The important line is:

2011-08-24 15:20:47,202 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 26666ms for sessionid
This indicates that either your ZK server was GCing for 26 seconds or
your region server was. Either way it ended up in:

org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired
Which is 13.6.2.7 here:
http://hbase.apache.org/book/trouble.rs.html#trouble.rs.runtime

J-D

Reply via email to