No, 2252 is not suitable for backporting to 0.7. On Thu, Apr 7, 2011 at 7:33 AM, ruslan usifov <ruslan.usi...@gmail.com> wrote: > > > 2011/4/7 Jonathan Ellis <jbel...@gmail.com> >> >> Hypothesis: it's probably the flush causing the CMS, not the snapshot >> linking. >> >> Confirmation possibility #1: Add a logger.warn to >> CLibrary.createHardLinkWithExec -- with JNA enabled it shouldn't be >> called, but let's rule it out. >> >> Confirmation possibility #2: Force some flushes w/o snapshot. >> >> Either way: "concurrent mode failure" is the easy GC problem. >> Hopefully you really are seeing mostly that -- this means the JVM >> didn't start CMS early enough, so it ran out of space before it could >> finish the concurrent collection, so it falls back to stop-the-world. >> The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction >> and (possibly) increasing heap capacity if your heap is simply too >> full too much of the time. >> >> You can also mitigate it by increasing the phi threshold for the >> failure detector, so the node doing the GC doesn't mark everyone else >> as dead. >> >> (Eventually your heap will fragment and you will see STW collections >> due to "promotion failed," but you should see that much less >> frequently. GC tuning to reduce fragmentation may be possible based on >> your workload, but that's out of scope here and in any case the "real" >> fix for that is https://issues.apache.org/jira/browse/CASSANDRA-2252.) >> > > Jonatan do you have plans to backport this to 0.7 branch. (Because It's very > hard to tune CMS, and if people is novice in java this task becomes much > harder ) >
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com