Thanks Todd.. I will try it out ..
On Feb 3, 2011, at 1:43 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Charan, > > Your GC settings are way off - 6m newsize will promote way too much to the > oldgen. > > Try this: > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m > -XX:CMSInitiatingOccupancyFraction=70 > > -Todd > > On Thu, Feb 3, 2011 at 12:28 PM, charan kumar <charan.ku...@gmail.com>wrote: > >> HI Jonathan, >> >> Thanks for you quick reply.. >> >> Heap is set to 4G. >> >> Following are the JVM opts. >> export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError >> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:NewSize=6m >> -XX:MaxNewSize=6m" >> >> Are there any other options apart from increasing the RAM? >> >> I am adding some more info about the app. >> >>> We are storing web page data in HBase. >>> Row key is Hashed URL, for random distribution, since we dont plan to do >> scan's.. >>> We have LZOCompression Set on this column family. >>> We were noticing 1500 Reads, when reading the page content. >>> We have a column family, which stores just metadata of the page "title" >> etc... When reading this the performance is whopping 12000 TPS. >> >> We though the issue could be because of N/w bandwidth used between HBase >> and Clients. So we disable LZO Compression on Column Family and started >> doing the compression of the raw page on the client and decompress it when >> readind (LZO). >> >>> With this my write performance jumped up from 2000 to 5000 at peak. >>> With this approach, the servers are crashing... Not sure , why only >> after >> turning of LZO... and doing the same from client. >> >> >> >> On Thu, Feb 3, 2011 at 12:13 PM, Jonathan Gray <jg...@fb.com> wrote: >> >>> How much heap are you running on your RegionServers? >>> >>> 6GB of total RAM is on the low end. For high throughput applications, I >>> would recommend at least 6-8GB of heap (so 8+ GB of RAM). >>> >>>> -----Original Message----- >>>> From: charan kumar [mailto:charan.ku...@gmail.com] >>>> Sent: Thursday, February 03, 2011 11:47 AM >>>> To: user@hbase.apache.org >>>> Subject: Region Servers Crashing during Random Reads >>>> >>>> Hello, >>>> >>>> I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 >> GB >>>> RAM) >>>> >>>> I had 9 Region Servers crash (out of 30) in a span of 30 minutes during >> a >>> heavy >>>> reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me. >>>> I did all recommended configuration from the Hbase wiki... Any other >>>> suggestions? >>>> >>>> >>>> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew >>>> (promotion >>>> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660: >>>> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark: >>>> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs] >>>> >>>> 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew >>>> (promotion >>>> failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224: >>>> [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark: >>>> 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs] >>>> >>>> 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew >>>> (promotion >>>> failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660: >>>> [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark: >>>> 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs] >>>> >>>> >>>> The following is the log entry in region Server >>>> >>>> 2011-02-03 10:37:43,946 INFO org.apache.zookeeper.ClientCnxn: Client >>>> session timed out, have not heard from server in 47172ms for sessionid >>>> 0x12db9f722421ce3, closing socket connection and attempting reconnect >>>> 2011-02-03 10:37:43,947 INFO org.apache.zookeeper.ClientCnxn: Client >>>> session timed out, have not heard from server in 48159ms for sessionid >>>> 0x22db9f722501d93, closing socket connection and attempting reconnect >>>> 2011-02-03 10:37:44,401 INFO org.apache.zookeeper.ClientCnxn: Opening >>>> socket connection to server XXXXXXXXXXXXXXXX >>>> 2011-02-03 10:37:44,402 INFO org.apache.zookeeper.ClientCnxn: Socket >>>> connection established to XXXXXXXXX, initiating session >>>> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Opening >>>> socket connection to server XXXXXXXXXXXXXXX >>>> 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Socket >>>> connection established to XXXXXXXXXXXXXXXXXXXXX, initiating session >>>> 2011-02-03 10:37:44,767 DEBUG >>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU >> eviction >>>> started; Attempting to free 81.93 MB of total=696.25 MB >>>> 2011-02-03 10:37:44,784 DEBUG >>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU >> eviction >>>> completed; freed=81.94 MB, total=614.81 MB, single=379.98 MB, >>>> multi=309.77 MB, memory=0 KB >>>> 2011-02-03 10:37:45,205 INFO org.apache.zookeeper.ClientCnxn: Unable to >>>> reconnect to ZooKeeper service, session 0x22db9f722501d93 has expired, >>>> closing socket connection >>>> 2011-02-03 10:37:45,206 INFO >>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem >>>> entation: >>>> This client just lost it's session with ZooKeeper, trying to reconnect. >>>> 2011-02-03 10:37:45,453 INFO >>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem >>>> entation: >>>> Trying to reconnect to zookeeper >>>> 2011-02-03 10:37:45,206 INFO org.apache.zookeeper.ClientCnxn: Unable to >>>> reconnect to ZooKeeper service, session 0x12db9f722421ce3 has expired, >>>> closing socket connection >>>> gionserver:60020-0x22db9f722501d93 regionserver:60020- >>>> 0x22db9f722501d93 >>>> received expired from ZooKeeper, aborting >>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >>>> KeeperErrorCode = Session expired >>>> at >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent( >>>> ZooKeeperWatcher.java:328) >>>> at >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeep >>>> erWatcher.java:246) >>>> at >>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja >>>> va:530) >>>> at >>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) >>>> handled exception: org.apache.hadoop.hbase.YouAreDeadException: Server >>>> REPORT rejected; currently processing XXXXXXXXXXXX,60020,1296684296172 >>>> as dead server >>>> org.apache.hadoop.hbase.YouAreDeadException: >>>> org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; >>>> currently processing XXXXXXXXXXXX,60020,1296684296172 as dead server >>>> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>> Method) >>>> at >>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor >>>> AccessorImpl.java:39) >>>> at >>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon >>>> structorAccessorImpl.java:27) >>>> at >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>>> at >>>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExce >>>> ption.java:96) >>>> at >>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(Remote >>>> Exception.java:80) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerRep >>>> ort(HRegionServer.java:729) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.j >>>> ava:586) >>>> at java.lang.Thread.run(Thread.java:619) >>>> >>>> >>>> 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew >>>> (promotion >>>> failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224: >>>> [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark: >>>> 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs] >>>> >>>> >>>> >>>> Thanks, >>>> Charan >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera