You mentioned in your email that "total data size varies between about 1 & 2K". I am guessing you meant by this that your individual record size varies between 1 & 2K.
If that is true, there is a good chance that you might be hitting the CMS occupancy fraction sooner than otherwise due to a varying record size. Consider encoding as a way to limit variation in individual record sizes. Opentsdb schema <http://opentsdb.net/schema.html> is a nice example of how we can use encoding to accomplish the same. On Mon, May 21, 2012 at 6:15 AM, Simon Kelly <simongdke...@gmail.com> wrote: > Great, thanks very much for the help. I'm going to see if I can get more > memory into the servers and will also experiment with XX:ParallelGCThreads. > We already have XX:CMSInitiatingOccupancyFraction=70 in the config. > > Uday, what do you mean by "a fixed size record"? Do you mean the record > that is being written to Hbase? > > > On 19 May 2012 12:44, Uday Jarajapu <uday.jaraj...@opower.com> wrote: > > > Also, try playing with > > > > #3) -XX:CMSInitiatingOccupancyFraction=70 to kick off a CMS GC sooner > > than a default trigger would. > > > > #4) a fixed size record to make sure you do not run into the promotion > > failure due to fragmentation > > > > > > On Fri, May 18, 2012 at 4:35 PM, Uday Jarajapu <uday.jaraj...@opower.com > >wrote: > > > >> I think you have it right for the most part, except you are underarmed > >> with only 8G and a 4-core box. Since you have Xmx=xms=4G, the default > >> collector (parallel) with the right number of threads might be able to > pull > >> it off. In fact, CMS might be defaulting to that eventually. > >> > >> As you know, CMS is great for sweeping heap sizes in the 8G-16G range > but > >> it eventually defaults to parallel GC for smaller heaps that run out of > >> space quickly. On top of that, it is non compacting. So, what works for > a > >> couple of cycles might quickly run out of room and leave no other choice > >> but to stop-the-world. To avoid the hit when that happens, try limiting > the > >> number of parallel GC Threads to be a third of your cores. In your case, > >> that would be 1 unfortunately. Try 1 or 2. > >> > >> I would recommend trying one of these two tests on the Region server: > >> > >> #1) -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > >> -XX:ParallelGCThreads=1 ( or 2) * > >> > >> *#2) -XX:ParallelGCThreads=2 * > >> * > >> The second test is just for giggles to see if the CMS aspect is helping > >> you at all (or if you are ending up doing a stop-the-world more than you > >> want. If that is the case, try using the default GC ) > >> * > >> ** > >> *Hope that helps, > >> Uday > >> > >> On Fri, May 18, 2012 at 4:54 AM, Simon Kelly <simongdke...@gmail.com > >wrote: > >> > >>> Hi > >>> > >>> Firstly, let me complement the Hbase team on a great piece of software. > >>> We're running a few clusters that are working well but we're really > >>> struggling with a new one I'm trying to setup and could use a bit of > help. > >>> I have read as much as I can but just can't seem to get it right. > >>> > >>> The difference between this cluster the others is that this one's load > >>> is 99% writes. Each write contains about 40 columns to a single table > and > >>> column family and the total data size varies between about 1 & 2K. The > load > >>> per server varies between 20 and 90 requests per second at different > times > >>> of the day. The row keys are UUID's so are uniformly distributed > across the > >>> (currently 60) regions. > >>> > >>> The problem seems to be that after some time a GC cycle takes longer > >>> that expected one of the regionservers and the master kills the > >>> regionserver. > >>> > >>> This morning I ran the system up till the first regionserver failure > and > >>> recorded the data with Ganglia. I have attached the following ganglia > >>> graphs: > >>> > >>> - hbase.regionserver.compactionQueueSize > >>> - hbase.regionserver.memstoreSizeMB > >>> - requests_per_minute (to the service that calls hbase) > >>> - request_processing_time (of the service that calls hbase) > >>> > >>> Any assistance would be greatly appreciated. I did have GC logging on > so > >>> have access to all that data too. > >>> > >>> Best regards > >>> Simon Kelly > >>> > >>> *Cluster details* > >>> *----------------------* > >>> Its running on 5 machines with the following specs: > >>> > >>> - CPUs: 4 x 2.39 GHz > >>> - RAM: 8 GB > >>> - Ubuntu 10.04.2 LTS > >>> > >>> The Hadoop cluster (version 1.0.1, r1243785) is running over all the > >>> machines that has 8TB of capacity (60% unused). On top of that is Hbase > >>> version 0.92.1, r1298924. All the servers run Hadoop datanodes and > Hbase > >>> regionservers. One server hosts the Hadoop primary namenode and the > Hbase > >>> master. 3 servers form the Zookeeper quorum. > >>> > >>> The Hbase config is as follows: > >>> > >>> - HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC > >>> -XX:+CMSIncrementalMode -XX:+UseParNewGC > >>> -XX:CMSInitiatingOccupancyFraction=70" > >>> - HBASE_HEAPSIZE=4096 > >>> > >>> > >>> - hbase.rootdir : hdfs://server1:8020/hbase > >>> - hbase.cluster.distributed : true > >>> - hbase.zookeeper.property.clientPort : 2222 > >>> - hbase.zookeeper.quorum : server1,server2,server3 > >>> - zookeeper.session.timeout : 30000 > >>> - hbase.regionserver.maxlogs : 16 > >>> - hbase.regionserver.handler.count : 50 > >>> - hbase.regionserver.codecs : lzo > >>> - hbase.master.startup.retainassign : false > >>> - hbase.hregion.majorcompaction : 0 > >>> > >>> (for the benefit of those without the attachements I'll describe the > >>> graphs: > >>> > >>> - 0900 - system starts > >>> - 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase > >>> compactions happen and a slight increase in request_processing_time > >>> - 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase > >>> compactions) > >>> - 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more > >>> hbase compactions happen and a slightly larger increase in > >>> request_processing_time > >>> - 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase > >>> compactions and increase in request_processing_time > >>> - 1230 - hbase logs for server1 record: We slept 13318ms instead of > >>> 3000ms and regionserver1 is killed by master, > request_processing_time goes > >>> way up > >>> - 1326 - hbase logs for server3 record: We slept 77377ms instead of > >>> 3000ms and regionserver2 is killed by master > >>> > >>> ) > >>> > >> > >> > > >