Hello, here is a version following the latest proposal of JC . Unfortunately attached as patch, sorry for that - the uploads / pushes currently do not work from here . Best regards, Matthias
> -----Original Message----- > From: David Holmes <[email protected]> > Sent: Mittwoch, 31. Juli 2019 05:04 > To: Jean Christophe Beyler <[email protected]> > Cc: Baesken, Matthias <[email protected]>; hotspot- > [email protected]; serviceability-dev <serviceability- > [email protected]> > Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java fails on fast > Linux machines with Total safepoint time 0 ms > > On 31/07/2019 9:08 am, Jean Christophe Beyler wrote: > > FWIW, I would have done something like what David was suggesting, just > > slightly tweaked: > > > > public static long executeThreadDumps() { > > long value; > > long initial_value = mbean.getTotalSafepointTime(); > > do { > > Thread.getAllStackTraces(); > > value = mbean.getTotalSafepointTime(); > > } while (value == initial_value); > > return value; > > } > > > > This ensures that the value is a new value as opposed to the current > > value and if something goes wrong, as David said, it will timeout; which > > is ok. > > Works for me. > > > But I come back to not really understanding why we are doing this at > > this point of relaxing (just get a new value of safepoint time). > > Because, if we accept timeouts now as a failure here, then really the > > whole test becomes: > > > > executeThreadDumps(); > > executeThreadDumps(); > > > > Since the first call will return when value > 0 and the second call will > > return when value2 > value (I still wonder why we want to ensure it > > works twice...). > > The test is trying to sanity check that we are actually recording the > time used by safepoints. So first check is that we can get a non-zero > value; second check is we get a greater non-zero value. It's just a > sanity test to try and catch if something gets unexpectedly broken in > the time tracking code. > > > So both failures and even testing for it is kind of redundant, once you > > have a do/while until a change? > > Yes - the problem with the tests that try to check internal VM behaviour > is that we have no specified way to do something, in this case execute > safepoints, that relates to internal VM behaviour, so we have to do > something we know will currently work even if not specified to do so - > e.g. dumping all thread stacks uses a global safepoint. The second > problem is that the timer granularity is so coarse that we then have to > guess how many times we need to do that something before seeing a > change. To make the test robust we can keep doing stuff until we see a > change and so the only way that will fail is if the overall timeout of > the test kicks in. Or we can try and second guess how long it should > take by introducing our own internal timeout - either directly or by > limiting the number of loops in this case. That has its own problems and > in general we have tried to reduce internal test timeouts (by removing > them) and let overall timeouts take charge. > > No ideal solution. And this has already consumed way too much of > everyone's time. > > Cheers, > David > > > Thanks, > > Jc > > > > > > On Tue, Jul 30, 2019 at 2:35 PM David Holmes <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 30/07/2019 10:39 pm, Baesken, Matthias wrote: > > > Hi David, "put that whole code (the while loop) in a helper > > method." was JC's idea, and I like the idea . > > > > Regardless I think the way you are using NUM_THREAD_DUMPS is really > > confusing. As an all-caps static you'd expect it to be a constant. > > > > Thanks, > > David > > > > > Let's see what others think . > > > > > >> > > >> Overall tests like this are not very useful, yet very fragile. > > >> > > > > > > I am also fine with putting the test on the exclude list. > > > > > > Best regards, Matthias > > > > > > > > >> -----Original Message----- > > >> From: David Holmes <[email protected] > > <mailto:[email protected]>> > > >> Sent: Dienstag, 30. Juli 2019 14:12 > > >> To: Baesken, Matthias <[email protected] > > <mailto:[email protected]>>; Jean Christophe > > >> Beyler <[email protected] <mailto:[email protected]>> > > >> Cc: [email protected] > > <mailto:[email protected]>; serviceability-dev > > <serviceability- > > >> [email protected] <mailto:[email protected]>> > > >> Subject: Re: RFR: [XS] 8228658: test GetTotalSafepointTime.java > > fails on fast > > >> Linux machines with Total safepoint time 0 ms > > >> > > >> Hi Matthias, > > >> > > >> On 30/07/2019 9:25 pm, Baesken, Matthias wrote: > > >>> Hello JC / David, here is a second webrev : > > >>> > > >>> http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.1/ > > >>> > > >>> It moves the thread dump execution into a method > > >>> executeThreadDumps(long) , and also adds while loops > > (but with a > > >>> limitation for the number of thread dumps, really don’t > > >>> want to cause timeouts etc.). I removed a check for > > >>> MAX_VALUE_FOR_PASS because we cannot go over > Long.MAX_VALUE . > > >> > > >> I don't think executeThreadDumps is worth factoring out like out. > > >> > > >> The handling of NUM_THREAD_DUMPS is a bit confusing. I'd rather it > > >> remains a constant 100, and then you set a simple loop iteration > > count > > >> limit. Further with the proposed code when you get here: > > >> > > >> 85 NUM_THREAD_DUMPS = NUM_THREAD_DUMPS * 2; > > >> > > >> you don't even know what value you may be starting with. > > >> > > >> But I was thinking of simply: > > >> > > >> long value = 0; > > >> do { > > >> Thread.getAllStackTraces(); > > >> value = mbean.getTotalSafepointTime(); > > >> } while (value == 0); > > >> > > >> We'd only hit a timeout if something is completely broken - > > which is fine. > > >> > > >> Overall tests like this are not very useful, yet very fragile. > > >> > > >> Thanks, > > >> David > > >> > > >>> Hope you like this version better. > > >>> > > >>> Best regards, Matthias > > >>> > > >>> *From:*Jean Christophe Beyler <[email protected] > > <mailto:[email protected]>> > > >>> *Sent:* Dienstag, 30. Juli 2019 05:39 > > >>> *To:* David Holmes <[email protected] > > <mailto:[email protected]>> > > >>> *Cc:* Baesken, Matthias <[email protected] > > <mailto:[email protected]>>; > > >>> [email protected] > > <mailto:[email protected]>; serviceability-dev > > >>> <[email protected] > > <mailto:[email protected]>> > > >>> *Subject:* Re: RFR: [XS] 8228658: test > > GetTotalSafepointTime.java fails > > >>> on fast Linux machines with Total safepoint time 0 ms > > >>> > > >>> Hi Matthias, > > >>> > > >>> I wonder if you should not do what David is suggesting and then > > put that > > >>> whole code (the while loop) in a helper method. Below you have a > > >>> calculation again using value2 (which I wonder what the added > > value of > > >>> it is though) but anyway, that value2 could also be 0 at some > > point, no? > > >>> > > >>> So would it not be best to just refactor the getAllStackTraces and > > >>> calculate safepoint time in a helper method for both value / value2 > > >>> variables? > > >>> > > >>> Thanks, > > >>> > > >>> Jc > > >>> > > >>> On Mon, Jul 29, 2019 at 7:50 PM David Holmes > > <[email protected] <mailto:[email protected]> > > >>> <mailto:[email protected] > > <mailto:[email protected]>>> wrote: > > >>> > > >>> Hi Matthias, > > >>> > > >>> On 29/07/2019 8:20 pm, Baesken, Matthias wrote: > > >>> > Hello , please review this small test fix . > > >>> > > > >>> > The test > > >>> > > >> > test/jdk/sun/management/HotspotRuntimeMBean/GetTotalSafepointTime. > > >> java > > >>> fails sometimes on fast Linux machines with this error > > message : > > >>> > > > >>> > java.lang.RuntimeException: Total safepoint time > > illegal value: 0 > > >>> ms (MIN = 1; MAX = 9223372036854775807) > > >>> > > > >>> > looks like the total safepoint time is too low > > currently on these > > >>> machines, it is < 1 ms. > > >>> > > > >>> > There might be several ways to handle this : > > >>> > > > >>> > * Change the test in a way that it might generate > > nigher > > >>> safepoint times > > >>> > * Allow safepoint time == 0 ms > > >>> > * Offer an additional interface that gives > > safepoint times > > >>> with finer granularity ( currently the HS has safepoint > > time values > > >>> in ns , see jdk/src/hotspot/share/runtime/safepoint.cpp > > >>> SafepointTracing::end > > >>> > > > >>> > But it is converted on ms in this code > > >>> > > > >>> > 114jlong RuntimeService::safepoint_time_ms() { > > >>> > 115 return UsePerfData ? > > >>> > 116 > > >>> > > Management::ticks_to_ms(_safepoint_time_ticks->get_value()) : -1; > > >>> > 117} > > >>> > > > >>> > 064jlong Management::ticks_to_ms(jlong ticks) { > > >>> > 2065 assert(os::elapsed_frequency() > 0, "Must be > > non-zero"); > > >>> > 2066 return (jlong)(((double)ticks / > > >>> (double)os::elapsed_frequency()) > > >>> > 2067 * (double)1000.0); > > >>> > 2068} > > >>> > > > >>> > > > >>> > > > >>> > Currently I go for the first attempt (and try to generate > > >>> higher safepoint times in my patch) . > > >>> > > >>> Yes that's probably best. Coarse-grained timing on very > > fast machines > > >>> was bound to eventually lead to problems. > > >>> > > >>> But perhaps a more future-proof approach is to just add a > > do-while loop > > >>> around the stack dumps and only exit when we have a non-zero > > >> safepoint > > >>> time? > > >>> > > >>> Thanks, > > >>> David > > >>> ----- > > >>> > > >>> > Bug/webrev : > > >>> > > > >>> > https://bugs.openjdk.java.net/browse/JDK-8228658 > > >>> > > > >>> > http://cr.openjdk.java.net/~mbaesken/webrevs/8228658.0/ > > >>> > > > >>> > > > >>> > Thanks, Matthias > > >>> > > > >>> > > >>> > > >>> -- > > >>> > > >>> Thanks, > > >>> > > >>> Jc > > >>> > > > > > > > > -- > > > > Thanks, > > Jc
8228658_total_safepoint_time2.changeset
Description: 8228658_total_safepoint_time2.changeset
