I doubt on your system load, because normally GC will not take more time to collect from 473MB (used heap), here allocated heap just gone upto 1.2GB.
Can u check system load factor from the top command & % system wait. What is your system configuration? Thanks & Regards, Gopinathan A **************************************************************************** *********** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -----Original Message----- From: Pablo Musa [mailto:pa...@psafe.com] Sent: Tuesday, July 10, 2012 7:58 PM To: user@hbase.apache.org Subject: RE: Hmaster and HRegionServer disappearance reason to ask I tried to change the flag but yesterday it happened again: Application time: 0.3025790 seconds 30013.866: [GC 30013.866: [ParNew: 106069K->989K(118016K), 178.8437590 secs] 473853K->369013K(1275392K), 178.8438570 secs] [Times: user=0.05 sys=178.82, real=178.81 secs] Total time for which application threads were stopped: 178.8441500 seconds I also checked the possibility of a swap, but I don't think it is the problem as vmstat always show clean swap. Help guys, please :) Abs, Pablo -----Original Message----- From: Dhaval Shah [mailto:prince_mithi...@yahoo.co.in] Sent: quinta-feira, 5 de julho de 2012 20:55 To: user@hbase.apache.org Subject: Re: Hmaster and HRegionServer disappearance reason to ask Pablo, instead of CMSIncrementalMode try UseParNewGC.. That seemed to be the silver bullet when I was dealing with HBase region server crashes Regards, Dhaval ________________________________ From: Pablo Musa <pa...@psafe.com> To: "user@hbase.apache.org" <user@hbase.apache.org> Sent: Thursday, 5 July 2012 5:37 PM Subject: RE: Hmaster and HRegionServer disappearance reason to ask I am having the same problem. I tried N different things but I cannot solve the problem. hadoop-0.20.noarch 0.20.2+923.256-1 hadoop-hbase.noarch 0.90.6+84.29-1 hadoop-zookeeper.noarch 3.3.5+19.1-1 I already set: <property> <name>hbase.hregion.memstore.mslab.enabled</name> <value>true</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>20</value> </property> But it does not seem to work. How can I check if this variables are really set in the HRegionServer? I am starting the server with the following: -Xmx8192m -XX:NewSize=64m -XX:MaxNewSize=64m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps I am also having trouble to read reagionserver.out [GC 72004.406: [ParNew: 55830K->2763K(59008K), 0.0043820 secs] 886340K->835446K(1408788K) icms_dc=0 , 0.0044900 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] [GC 72166.759: [ParNew: 55192K->6528K(59008K), 135.1102750 secs] 887876K->839688K(1408788K) icms_dc=0 , 135.1103920 secs] [Times: user=1045.58 sys=138.11, real=135.09 secs] [GC 72552.616: [ParNew: 58977K->6528K(59008K), 0.0083040 secs] 892138K->847415K(1408788K) icms_dc=0 , 0.0084060 secs] [Times: user=0.05 sys=0.01, real=0.01 secs] [GC 72882.991: [ParNew: 58979K->6528K(59008K), 151.4924490 secs] 899866K->853931K(1408788K) icms_dc=0 , 151.4925690 secs] [Times: user=0.07 sys=151.48, real=151.47 secs] What does each part means? Each line is a GC cicle? Thanks, Pablo -----Original Message----- From: Lars George [mailto:lars.geo...@gmail.com] Sent: segunda-feira, 2 de julho de 2012 06:43 To: user@hbase.apache.org Subject: Re: Hmaster and HRegionServer disappearance reason to ask Hi lztaomin, > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired indicates that you have experienced the "Juliet Pause" issue, which means you ran into a JVM garbage collection that lasted longer than the configured ZooKeeper timeout threshold. If you search for it on Google http://www.google.com/search?q=juliet+pause+hbase you will find quite a few pages explaining the problem, and what you can do to avoid this. Lars On Jul 2, 2012, at 10:30 AM, lztaomin wrote: > HI ALL > > My HBase group a total of 3 machine, Hadoop HBase mounted in the same machine, zookeeper using HBase own. Operation 3 months after the reported abnormal as follows. Cause hmaster and HRegionServer processes are gone. Please help me. > Thanks > > The following is a log > > ABORTING region server serverName=datanode1,60020,1325326435553, > load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): > regionserver:60020-0x3488dec38a02b1 > regionserver:60020-0x3488dec38a02b1 received expired from ZooKeeper, > aborting > Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo > KeeperWatcher.java:343) at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa > tcher.java:261) at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja > va:530) at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > 2012-07-01 13:45:38,707 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: > Splitting logs for datanode1,60020,1325326435553 > 2012-07-01 13:45:38,756 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 > hlog(s) in > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 > 2012-07-01 13:45:38,764 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog > 1 of 32: > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352, length=5671397 > 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: > Recovering file > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352 > 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: > Finished lease recover attempt for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341006689352 > 2012-07-01 13:45:39,880 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using > syncFs -- HDFS-200 > 2012-07-01 13:45:39,925 INFO > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using > syncFs -- HDFS-200 > > ABORTING region server serverName=datanode2,60020,1325146199444, > load=(requests=614, regions=189, usedHeap=3662, maxHeap=8165): > regionserver:60020-0x3488dec38a0002 > regionserver:60020-0x3488dec38a0002 received expired from ZooKeeper, > aborting > Cause: > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo > KeeperWatcher.java:343) at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa > tcher.java:261) at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja > va:530) at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > 2012-07-01 13:24:10,308 INFO org.apache.hadoop.hbase.util.FSUtils: > Finished lease recover attempt for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341075090535 > 2012-07-01 13:24:10,918 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog > 21 of 32: > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod > e1%3A60020.1341078690560, length=11778108 > 2012-07-01 13:24:29,809 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path > hdfs://namenode:9000/hbase/t_speakfor_relation_chapter/ffd2057b46da227 > e078c82ff43f0f9f2/recovered.edits/0000000000660951991 (wrote 8178 > edits in 403ms) > 2012-07-01 13:24:29,809 INFO > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file > splitting completed in -1268935 ms for > hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 > 2012-07-01 13:24:29,824 INFO > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received > exception accessing META during server shutdown of > datanode1,60020,1325326435553, retrying META read > org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not > running, aborting at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionSe > rver.java:2408) at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegi > onServer.java:1649) at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess > orImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1 > 039) > > > > lztaomin