Hi Paul,
Probably best to also include the GC mailing list here.
David
On 24/10/2017 2:31 AM, Hohensee, Paul wrote:
I’d like to solicit opinions on reporting GC pause duration
(stopped-world pause time) via JMX. This info would be useful in
figuring out whether or not GC pause times are factors in failing to
meet response time SLAs. The info is of course available directly from
GC logs, but parsing logs is fraught and JMX doesn’t seem to report the
equivalent info.
GcInfo
https://docs.oracle.com/javase/9/docs/api/com/sun/management/GcInfo.html
has a getDuration() method which works fine for the non-concurrent
collectors (since they’re STW), but for CMS and G1 it appears to report
the duration of an entire concurrent cycle, which isn’t what I want. The
number of STW pauses during a concurrent cycle varies by collector, so
ideally there would be a method that reports cause (as a string) and
duration for each STW pause. If that’s too much, perhaps the minimum
might be a getMaxPauseDuration() method that reports the maximum pause
duration of all the STW pauses that happen during a concurrent cycle.
Relatedly, the full compacting GCs that happen as a result of CMS and G1
concurrent mode failure aren’t reported separately from concurrent
cycles. It would be useful to differentiate these from
“ConcurrentMarkSweep” and “G1 Old Generation”. Perhaps add collector
types to CMS and G1, vis. “MarkSweepCompact” (which already exists and
is literally what’s executed by CMS) and a new “G1 MarkSweepCompact”
collector for G1.
If there’s a consensus that something should be done about either of
these issues, I’d be happy to file RFE(s) and do the work.
Thanks,
Paul