Re: Proposal: Always-on Statistical History

Marcus Hirt Thu, 15 Nov 2018 09:13:30 -0800

Hi all,

I'm with Roger on this one. This is an aggregation mechanism. If we want such 
an aggregation mechanism, we should probably build one into one of the already 
available serviceability technologies (JMX and/or JFR). If we feel the need to 
introduce a generic one that can source data from multiple serviceability 
technologies (even though that smells a bit of user application/agent code),
then it should integrate well with already available serviceability 
technologies (perhaps sourcing the upcoming streaming JFR and/or JMX), come
with an API to interact with it, and be general, configurable and extensible. 
Either way, I think this requires more thought.


Another $.02.

Kind regards,
Marcus

On 2018-11-15, 17:42, "serviceability-dev on behalf of Roger Riggs" 
<[email protected] on behalf of 
[email protected]> wrote:

    Hi,
    
    This looks like it has significant overlap with JFR.
    I don't think we want to start building in multiple mechanisms to keep 
    tabs on a running VM.
    
    $.02, Roger
    
    
    On 11/14/2018 04:27 PM, Thomas Stüfe wrote:
    > Hi Bernd,
    >
    > On Wed, Nov 14, 2018 at 10:07 PM Bernd Eckenfels <[email protected]> 
wrote:
    >> Looks good Thomas,
    > thanks!
    >
    >> what would be the typical memory usage with the Default Settings?
    > ~ 80 Kb. Its very small.
    >
    >> Does the downsampling support min/max style rollups?
    > Not sure what you mean. Do you mean does it preserve peaks? Not yet,
    > such a feature would have to be added.
    >
    > Right now, downsampling is very primitive for performance reasons. For
    > snapshot values like heap size etc we just throw away the samples, so
    > you loose temporary peaks. For counter-like values-over-time (e.g.
    > number of pages swapped in etc), they just refer then to a larger time
    > span.
    >
    > Best Regards, Thomas
    >
    >>
    >>
    >> --
    >> http://bernd.eckenfels.net
    >>
    >>
    >>
    >> Von: Thomas Stüfe
    >> Gesendet: Mittwoch, 14. November 2018 16:29
    >> An: [email protected] 
[email protected]
    >> Betreff: Proposal: Always-on Statistical History
    >>
    >>
    >>
    >> Hi all,
    >>
    >>
    >>
    >> We have that feature in our port which we would like to contribute,
    >>
    >> and I would like to gauge opinions.
    >>
    >>
    >>
    >> First off, I am not sure which list is correct. This is more of a
    >>
    >> serviceability issue, but implementation wise it fit hs-runtime
    >>
    >> better. I'll start with serviceability, but feel free crosspost if
    >>
    >> needed.
    >>
    >>
    >>
    >> Second, I am aware that this may require a JEP. If necessary and the
    >>
    >> feedback is positive, I will draft one.
    >>
    >>
    >>
    >> ----
    >>
    >>
    >>
    >> In our port we have something called "Statistics History". Basically
    >>
    >> this is a rolling history, spanning up to 10 days, of a number of key
    >>
    >> values. Key values range from JVM specifics like heap size, metaspace
    >>
    >> size, number of threads etc, to platform specifics like memory
    >>
    >> footprint, cpu load, io- and swapping activity etc.
    >>
    >>
    >>
    >> A periodic tasks collects those values, in - by default - 15 second
    >>
    >> intervals. They are then fed into a FIFO. FIFO spans 10 days. To save
    >>
    >> memory that FIFO is downsampled in two steps, so we have the last n
    >>
    >> hours in high resolution and the last n days in low resolution (of
    >>
    >> course all these parameters are configurable).
    >>
    >>
    >>
    >> The history report can be triggered via jcmd, and also could get
    >>
    >> printed in the hs.err file (open for debate).
    >>
    >>
    >>
    >> ---
    >>
    >>
    >>
    >> Here some examples of how the whole thing looks like:
    >>
    >>
    >>
    >> 
http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-volker.txt
    >>
    >>
    >>
    >> 
http://cr.openjdk.java.net/~stuefe/webrevs/stathist/examples/stathist-s390x.txt
    >>
    >>
    >>
    >> ---
    >>
    >>
    >>
    >> This feature has been really popular with our support folk over the
    >>
    >> years. Be it that the VM is starved for resources by the OS, that we
    >>
    >> have some slow- or fast developing leak situation etc: these values
    >>
    >> are a first and easy way to get a first stab at a situation, before we
    >>
    >> start more expensive analysis.
    >>
    >>
    >>
    >> The explicit design goal of this history was to be very cheap - cheap
    >>
    >> enough to be *always on* and getting forgotten. It is, in our port,
    >>
    >> enabled by default. That way, if a problem occurs at a customer site,
    >>
    >> we immediately see developments spanning the last 10 days, without
    >>
    >> having to reproduce the issue.
    >>
    >>
    >>
    >> It is also robust enough to be usable during error reporting without
    >>
    >> endangering the error reporting process or falsifying the picture.
    >>
    >>
    >>
    >> I am aware that this crosses over into JFR territory. But this feature
    >>
    >> does not attempt to replace JFR, it is intended instead a cheap always
    >>
    >> on first stop historical overview.
    >>
    >>
    >>
    >> --
    >>
    >>
    >>
    >> I have a patch which can be applied atop of jdk12:
    >>
    >>
    >>
    >> http://cr.openjdk.java.net/~stuefe/webrevs/stathist/stathist.patch
    >>
    >>
    >>
    >> It works, passes our nightlies and no regressions are shown in dapapo
    >>
    >> benchmarks.
    >>
    >>
    >>
    >> Please tell me what you think. Given enough interest, I will attempt
    >>
    >> to contribute (drafting a JEP if necessary.)
    >>
    >>
    >>
    >> Thanks and Kind Regards,
    >>
    >>
    >>
    >> Thomas
    >>
    >>

Re: Proposal: Always-on Statistical History

Reply via email to