Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

Jonathan Ellis Thu, 07 Apr 2011 07:29:11 -0700

No, 2252 is not suitable for backporting to 0.7.

On Thu, Apr 7, 2011 at 7:33 AM, ruslan usifov <ruslan.usi...@gmail.com> wrote:
>
>
> 2011/4/7 Jonathan Ellis <jbel...@gmail.com>
>>
>> Hypothesis: it's probably the flush causing the CMS, not the snapshot
>> linking.
>>
>> Confirmation possibility #1: Add a logger.warn to
>> CLibrary.createHardLinkWithExec -- with JNA enabled it shouldn't be
>> called, but let's rule it out.
>>
>> Confirmation possibility #2: Force some flushes w/o snapshot.
>>
>> Either way: "concurrent mode failure" is the easy GC problem.
>> Hopefully you really are seeing mostly that -- this means the JVM
>> didn't start CMS early enough, so it ran out of space before it could
>> finish the concurrent collection, so it falls back to stop-the-world.
>> The fix is a combination of reducing XX:CMSInitiatingOccupancyFraction
>> and (possibly) increasing heap capacity if your heap is simply too
>> full too much of the time.
>>
>> You can also mitigate it by increasing the phi threshold for the
>> failure detector, so the node doing the GC doesn't mark everyone else
>> as dead.
>>
>> (Eventually your heap will fragment and you will see STW collections
>> due to "promotion failed," but you should see that much less
>> frequently. GC tuning to reduce fragmentation may be possible based on
>> your workload, but that's out of scope here and in any case the "real"
>> fix for that is https://issues.apache.org/jira/browse/CASSANDRA-2252.)
>>
>
> Jonatan do you have plans to backport this to 0.7 branch. (Because It's very
> hard to tune CMS, and if people is novice in java this task becomes much
> harder )
>




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Flush / Snapshot Triggering Full GCs, Leaving Ring

Reply via email to