Simon Zhou created CASSANDRA-13491:
--------------------------------------

             Summary: Emit metrics for JVM safepoint pause
                 Key: CASSANDRA-13491
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13491
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Simon Zhou


GC pause is not the only source of latency from JVM. In one of our recent 
production issues, the metrics for GC looks good (some >200ms and longest 
500ms) but GC logs show periodic pauses like this:
{code}
2017-04-26T01:51:29.420+0000: 352535.998: Total time for which application 
threads were stopped: 19.8835870 seconds, Stopping threads took: 19.7842073 
seconds
{code}

This huge delay should be JVM malfunction but it caused some requests timeout. 
So I'm suggesting to add support for safepoint pause for better observability. 
Two problems though:
1. This depends on JVM. Some JVMs may not expose these internal MBeans. This is 
actually the same case for existing GCInspector.
2. For Hotspot, it has HotspotRuntime as an internal MBean so that we can get 
safepoint pause. However, there is no notification support for that. I got 
error "MBean sun.management:type=HotspotRuntime does not implement 
javax.management.NotificationBroadcaster" when trying to register a listener. 
This means we will need to pull the safepoint pauses from HotspotRuntime 
periodically.

Reference:
http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html

Anyone think we should support this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to