> On node "172.16.107.46", I see the following: > > 21:53:27.192+0100: 1335393.834: [GC 1335393.834: [ParNew (promotion failed): > 319468K->324959K(345024K), 0.1304456 secs]1335393.964: [CMS: > 6000844K->3298251K(8005248K), 10.8526193 secs] 6310427K->3298251K(8350272K), > [CMS Perm : 26355K->26346K(44268K)], 10.9832679 secs] [Times: user=11.15 > sys=0.03, real=10.98 secs] > 21:53:38,174 GC for ConcurrentMarkSweep: 10856 ms for 1 collections, > 3389079904 used; max is 8550678528 > > I have not yet tested the "XX:+DisableExplicitGC" switch. > > Is the right thing to do to decrease the CMSInitiatingOccupancyFraction > setting?
* Increasing the total heap size can definitely help; the only kink is that if you need to increase the heap size unacceptably much, it is not helpful. * Decreasing the occupancy trigger can help yes, but you will get very much diminishing returns as your trigger fraction approaches the actual live size of data on the heap. * I just re-checked your original message - you're on Cassandra 0.7? I *strongly* suggest upgrading to 1.x. In general that holds true, but also specifically relating to this are significant improvements in memory allocation behavior that significantly reduces the probability and/or frequency of promotion failures and full gcs. * Increasing the size of the young generation can help by causing less promotion to old-gen (see the cassandra.in.sh script or equivalent of for Windows). * Increasing the amount of parallel threads used by CMS can help CMS complete it's marking phase quicker, but at the cost of a greater impact on the mutator (cassandra). I think the most important thing is - upgrade to 1.x before you run these benchmarks. Particularly detailed tuning of GC issues is pretty useless on 0.7 given the significant changes in 1.0. Don't even bother spending time on this until you're on 1.0, unless this is about a production cluster that you cannot upgrade for some reason. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)