Sorry about the delay,

> I do believe there is a fundamental issue with compactions allocating too 
> much memory and incurring too many garbage collections (at least with 0.6.12).

[snip a lot of good info]

You certainly seem to have a real issue, though I don't get the feel
it's the same as the OP.

I don't think I can offer a silver bullet. I was going to suggest that
you're seeing rows that are large enough that you're taking young-gen
GC:s prior to the complection of individual rows so that the per-row
working set is promoted to old-gen, yet small enough (row) to be below
in_memory_compaction_limit_in_mb. But this seems inconsistent with the
fact that you report problems even with huge new-gen (10 gig).

With the large new-gen, you were actually seeing fallbacks to full GC?
You weren't just still experiencing problems because at 10 gig, the
new-gen will be so slow to compact to effectively be similar to a full
gc in terms of affecting latency?

If there is any suspicion that the above is happening, maybe try
decreasing in_memory_compaction_limit_in_mb (preparing to see lots of
stuff logged to console, assuming that's still happening in the 0.6.
version you're running).

Also, you did mention taking into account tenuring into old-gen, so
maybe your observations there are inconsistent with the above
hypothesis too. But just one correction/note regarding this: You said
that:

   "However, when the young generation is being collected (which
happens VERY often during compactions b/c allocation rate is so high),
objects are allocated directly into the tenured generation."

I"m not sure on what you're basing that, but unless I have fatally
failed to grok something fundamental about the interaction between
new-gen and old-gen with CMS, object's aren't being allocated *period*
while the "young generation is being collected" as that is a
stop-the-world pause. (This is also why I said before that at 10 gig
new-gen size, the observed behavior on young gen collections may be
similar to fallback-to-full-gc cases, but not quite since it would be
parallel rather than serial)

Anyways, I sympathize with your issues and the fact that you don't
have time to start attaching with profilers etc. Unfortunately I don't
know what to suggest that is simpler than that.

-- 
/ Peter Schuller

Reply via email to