Hi David,

>     So we never try to access the uninitialized counters.cpus array which is 
>    good but we still return garbage for counters.jvmTicks and 
>   counters.cpuTicks - surely that should have been noticeable?

It only affected the first time the CPU load was requested. Function 
get_cpuload_internal(...)
calls get_totalticks() and get_jvmticks() functions that update these counters.
But on the first call, yes, it compares the newly received counters with the 
garbage.


It has the code that seems to be written to somehow mitigate it and in worse 
case just return
0 or 1.0. But it also could be  that there is some other problem this code 
tries to solve so I'm not sure
we should remove these workarounds as a part of the current fix.

274             // seems like we sometimes end up with less kernel ticks when
 275             // reading /proc/self/stat a second time, timing issue between 
cpus?
 276             if (pticks->usedKernel < tmp.usedKernel) {
 277                 kdiff = 0;
 278             } else {
 279                 kdiff = pticks->usedKernel - tmp.usedKernel;
 280             }
 281             tdiff = pticks->total - tmp.total;
 282             udiff = pticks->used - tmp.used;
 283 
 284             if (tdiff == 0) {
 285                 user_load = 0;
 286             } else {
 287                 if (tdiff < (udiff + kdiff)) {
 288                     tdiff = udiff + kdiff;
 289                 }
 290                 *pkernelLoad = (kdiff / (double)tdiff);
 291                 // BUG9044876, normalize return values to sane values
 292                 *pkernelLoad = MAX(*pkernelLoad, 0.0);
 293                 *pkernelLoad = MIN(*pkernelLoad, 1.0);
 294 
 295                 user_load = (udiff / (double)tdiff);
 296                 user_load = MAX(user_load, 0.0);
 297                 user_load = MIN(user_load, 1.0);
 298             }
 299         }
 
Best regards,
Daniil

On 12/8/19, 8:49 PM, "David Holmes" <david.hol...@oracle.com> wrote:

    Hi Daniil,
    
    On 7/12/2019 11:41 am, Daniil Titov wrote:
    > Hi David, Mandy, and Bob,
    > 
    > Thank you for reviewing this fix.
    > 
    > Please review a new version of the fix [1] that includes the following 
changes comparing to the previous version of the webrev ( webrev.04)
    > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 
and to CSR [3] were discarded.
    
    Okay.
    
    > 2.  The implementation of methods getFreeMemorySize, getTotalMemorySize, 
getFreeSwapSpaceSize and getTotalSwapSpaceSize
    >       was also reverted to webrev.03 version that return host's values if 
the metrics are unavailable or cannot be properly read.
    
    Okay.
    
    >       I would like to mention that  currently the native implementation 
of these methods de-facto may return -1 at some circumstances,
    >       but I agree that the changes proposed in the previous version of 
the webrev increase such probability.
    >       I filed the follow-up issue [4] as Mandy suggested.
    
    I added a comment to the bug. This is potentially a difficult problem to 
    resolve - it all depends on the likelihood of any errors and what they 
    really indicate.
    
    > 3.  The legacy methods were renamed as David suggested.
    
    Thanks!
    
    > 
    >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c
    >> !     static int initialized=1;
    >>
    >>   Am I reading this right that the code currently fails to actually do 
the
    >> initialization because of this ???
    > 
    > Yes, currently the code fails to do the initialization but it was 
unnoticed since method
    > get_cpuload_internal(...) was never called for a specific CPU, the first 
parameter "which"
    > was always -1.
    
    So we never try to access the uninitialized counters.cpus array which is 
    good but we still return garbage for counters.jvmTicks and 
    counters.cpuTicks - surely that should have been noticeable?
    
    >>   test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
    >>
    >> System.out.println(String.format(...)
    >>
    >> Why not simply
    >>
    >> System.out.printf(..)
    > 
    > As I tried explain it earlier it would make the tests unstable.
    > System.out.printf(...) just delegates the call to System.out.format(...) 
that doesn't emit the string atomically.
    > Instead it parses the format string into a list of FormatString objects 
and then iterates over the list.
    > As a result, the other traces occasionally got printed between these 
iterations  and break the pattern the test is expected to find
    > in the output.
    
    Sorry I missed the earlier explanation. I find it somewhat surprising 
    that format() works that way, but without unlimited buffering there will 
    always be a need to flush the outputstream at some point.
    
    Thanks,
    David
    -----
    
    > For example, here is the sample of a such output that has the trace 
message printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:"
    > and "1030762496".
    > 
    > <skipped>
    > [0.304s][trace][os,container] Memory Usage is: 42983424
    > OperatingSystemMXBean.getFreeMemorySize: 1030758400
    > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is 
/sys/fs/cgroup/memory/memory.usage_in_bytes
    > [0.305s][trace][os,container] Memory Usage is: 42979328
    > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is 
/sys/fs/cgroup/memory/memory.usage_in_bytes
    > OperatingSystemMXBean.getFreePhysicalMemorySize: 
[0.306s][trace][os,container] Memory Usage is: 42975232
    > 1030762496
    > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176
    > 
    > <skipped>
    > java.lang.RuntimeException: 
'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from 
stdout/stderr
    > 
    >   at 
jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306)
    >   at 
TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151)
    >   at TestMemoryAwareness.main(TestMemoryAwareness.java:73)
    >   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    >   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    >   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    >   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    >   at 
com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    >   at java.base/java.lang.Thread.run(Thread.java:832)
    > 
    > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker 
tests passed. Tier4-tier6 tests are still running.
    > 
    > [1] Webrev:  http://cr.openjdk.java.net/~dtitov/8226575/webrev.05
    > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575
    > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428
    > [4] https://bugs.openjdk.java.net/browse/JDK-8235522
    > 
    > Thank you,
    > Daniil
    > 
    > On 12/6/19, 1:38 PM, "Mandy Chung" <mandy.ch...@oracle.com> wrote:
    > 
    >      
    >      
    >      On 12/6/19 5:59 AM, Bob Vandette wrote:
    >      >> On Dec 6, 2019, at 2:49 AM, David Holmes<david.hol...@oracle.com> 
 wrote:
    >      >>
    >      >>
    >      >> 
src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java
    >      >>
    >      >> The changes to allow for a return of -1 are somewhat more 
extensive than we have previously discussed. These methods previously were (per 
the spec) guaranteed to return some (assumably) meaningful value but now they 
are effectively allowed to fail by returning -1. No existing code is expecting 
to have to handle a return of -1 so I see this as a significant compatibility 
issue.
    >      
    >      I thought that the error case we are referring to is limit == 0 which
    >      indicates something unexpected goes wrong.  So the compatibility 
concern
    >      should be low.  This is very specific to Metrics implementation for
    >      cgroup v1 and let me know if I'm wrong.
    >      
    >      >> Surely there must always be some information available from the 
operating environment? I see from the impl file:
    >      >>
    >      >>     // the host data, value 0 indicates that something went wrong 
while the metric was read and
    >      >>    // in this case we return "information unavailable" code -1.
    >      >>
    >      >> I don't agree with this. If the container metrics are messed up 
somehow we should either fallback to the host value or else abort with some 
kind of exception. Returning -1 is not an option here IMO.
    >      > I agree with David on the compatibility concern.  I originally 
thought that -1 was already a specified return for all of these methods.
    >      > Since the 0 return failure from the Metrics API should only occur 
if one of the cgroup subsystems is not enabled while others
    >      > are, I’d suggest we keep Daniil’s original logic to fall back to 
the host value since a disabled subsystem is equivalent to no
    >      > limits.
    >      >
    >      
    >      It's important to consider carefully if the monitoring API indicates 
an
    >      error vs unavailable and an application should continue to run when 
the
    >      monitoring system fails to get the metrics.
    >      
    >      There are several choices to report "something goes wrong" scenarios
    >      (should unlikely happen???):
    >      1. fall back to a random positive value  (e.g. host value)
    >      2. return a negative value
    >      3. throw an exception
    >      
    >      #3 is not an option as the application is not expecting this.  For 
#2,
    >      the application can filter bad values if desirable.
    >      
    >      I'm okay if you want to file a JBS issue to follow up and thoroughly
    >      look at the cases that the metrics are unavailable and the cases when
    >      fails to obtain.
    >      
    >      >> ---
    >      >>
    >      >> 
test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java
    >      >>
    >      >> System.out.println(String.format(...)
    >      >>
    >      >> Why not simply
    >      >>
    >      >> System.out.printf(..)
    >      >>
    >      >> ?
    >      
    >      or simply (as I commented [1])
    >           System.out.format
    >      
    >      Mandy
    >      [1]
    >      
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html
    >      
    >      
    > 
    > 
    


Reply via email to