Why did you not change the exception caught in SubSystem.java:getStringValue to PrivilegedActionException from IOException so it’s consistent with the other get functions?
Bob. > On Dec 6, 2019, at 8:41 PM, Daniil Titov <daniil.x.ti...@oracle.com> wrote: > > Hi David, Mandy, and Bob, > > Thank you for reviewing this fix. > > Please review a new version of the fix [1] that includes the following > changes comparing to the previous version of the webrev ( webrev.04) > 1. The changes in Javadoc made in the webrev.04 comparing to webrev.03 and to > CSR [3] were discarded. > 2. The implementation of methods getFreeMemorySize, getTotalMemorySize, > getFreeSwapSpaceSize and getTotalSwapSpaceSize > was also reverted to webrev.03 version that return host's values if the > metrics are unavailable or cannot be properly read. > I would like to mention that currently the native implementation of > these methods de-facto may return -1 at some circumstances, > but I agree that the changes proposed in the previous version of the > webrev increase such probability. > I filed the follow-up issue [4] as Mandy suggested. > 3. The legacy methods were renamed as David suggested. > > >> src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c >> ! static int initialized=1; >> >> Am I reading this right that the code currently fails to actually do the >> initialization because of this ??? > > Yes, currently the code fails to do the initialization but it was unnoticed > since method > get_cpuload_internal(...) was never called for a specific CPU, the first > parameter "which" > was always -1. > >> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >> >> System.out.println(String.format(...) >> >> Why not simply >> >> System.out.printf(..) > > As I tried explain it earlier it would make the tests unstable. > System.out.printf(...) just delegates the call to System.out.format(...) that > doesn't emit the string atomically. > Instead it parses the format string into a list of FormatString objects and > then iterates over the list. > As a result, the other traces occasionally got printed between these > iterations and break the pattern the test is expected to find > in the output. > > For example, here is the sample of a such output that has the trace message > printed between " OperatingSystemMXBean.getFreePhysicalMemorySize:" > and "1030762496". > > <skipped> > [0.304s][trace][os,container] Memory Usage is: 42983424 > OperatingSystemMXBean.getFreeMemorySize: 1030758400 > [0.305s][trace][os,container] Path to /memory.usage_in_bytes is > /sys/fs/cgroup/memory/memory.usage_in_bytes > [0.305s][trace][os,container] Memory Usage is: 42979328 > [0.306s][trace][os,container] Path to /memory.usage_in_bytes is > /sys/fs/cgroup/memory/memory.usage_in_bytes > OperatingSystemMXBean.getFreePhysicalMemorySize: > [0.306s][trace][os,container] Memory Usage is: 42975232 > 1030762496 > OperatingSystemMXBean.getTotalSwapSpaceSize: 499122176 > > <skipped> > java.lang.RuntimeException: > 'OperatingSystemMXBean\\.getFreePhysicalMemorySize: [1-9][0-9]+' missing from > stdout/stderr > > at > jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:306) > at > TestMemoryAwareness.testOperatingSystemMXBeanAwareness(TestMemoryAwareness.java:151) > at TestMemoryAwareness.main(TestMemoryAwareness.java:73) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298) > at java.base/java.lang.Thread.run(Thread.java:832) > > Testing: Mach5 tier1-tier3 and open/test/hotspot/jtreg/containers/docker > tests passed. Tier4-tier6 tests are still running. > > [1] Webrev: http://cr.openjdk.java.net/~dtitov/8226575/webrev.05 > [2] Jira issue: https://bugs.openjdk.java.net/browse/JDK-8226575 > [3] CSR: https://bugs.openjdk.java.net/browse/JDK-8228428 > [4] https://bugs.openjdk.java.net/browse/JDK-8235522 > > Thank you, > Daniil > > On 12/6/19, 1:38 PM, "Mandy Chung" <mandy.ch...@oracle.com> wrote: > > > > On 12/6/19 5:59 AM, Bob Vandette wrote: >>> On Dec 6, 2019, at 2:49 AM, David Holmes<david.hol...@oracle.com> wrote: >>> >>> >>> src/jdk.management/share/classes/com/sun/management/OperatingSystemMXBean.java >>> >>> The changes to allow for a return of -1 are somewhat more extensive than we >>> have previously discussed. These methods previously were (per the spec) >>> guaranteed to return some (assumably) meaningful value but now they are >>> effectively allowed to fail by returning -1. No existing code is expecting >>> to have to handle a return of -1 so I see this as a significant >>> compatibility issue. > > I thought that the error case we are referring to is limit == 0 which > indicates something unexpected goes wrong. So the compatibility concern > should be low. This is very specific to Metrics implementation for > cgroup v1 and let me know if I'm wrong. > >>> Surely there must always be some information available from the operating >>> environment? I see from the impl file: >>> >>> // the host data, value 0 indicates that something went wrong while the >>> metric was read and >>> // in this case we return "information unavailable" code -1. >>> >>> I don't agree with this. If the container metrics are messed up somehow we >>> should either fallback to the host value or else abort with some kind of >>> exception. Returning -1 is not an option here IMO. >> I agree with David on the compatibility concern. I originally thought that >> -1 was already a specified return for all of these methods. >> Since the 0 return failure from the Metrics API should only occur if one of >> the cgroup subsystems is not enabled while others >> are, I’d suggest we keep Daniil’s original logic to fall back to the host >> value since a disabled subsystem is equivalent to no >> limits. >> > > It's important to consider carefully if the monitoring API indicates an > error vs unavailable and an application should continue to run when the > monitoring system fails to get the metrics. > > There are several choices to report "something goes wrong" scenarios > (should unlikely happen???): > 1. fall back to a random positive value (e.g. host value) > 2. return a negative value > 3. throw an exception > > #3 is not an option as the application is not expecting this. For #2, > the application can filter bad values if desirable. > > I'm okay if you want to file a JBS issue to follow up and thoroughly > look at the cases that the metrics are unavailable and the cases when > fails to obtain. > >>> --- >>> >>> test/hotspot/jtreg/containers/docker/CheckOperatingSystemMXBean.java >>> >>> System.out.println(String.format(...) >>> >>> Why not simply >>> >>> System.out.printf(..) >>> >>> ? > > or simply (as I commented [1]) > System.out.format > > Mandy > [1] > > https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-December/029930.html > > > >