On 12/7/18 7:58 AM, Gary Adams wrote:
On 12/6/18, 7:52 PM, David Holmes wrote:
Hi Gary,

On 6/12/2018 11:27 pm, Gary Adams wrote:
On a local linux-x64-debug build this test consistently runs in less than 40 seconds. On the mach5 testing machines there was a large fluctuation in the time to complete this test.

Since the test runs jps with different combinations of arguments, the total test time is dependent on the number of processes running java programs.
Since the mach5 test infrastructure runs java programs I have seen a 3X
in the amount of output the test produces compared to local test
runs.

I've run the test several hundred times through mach5 on the slower platforms and then examined the test logs using a 3X setting from the default 120 second jtreg timeout. The slowest reported elapse time from the test logs showed
280 seconds to complete.

To improve the reliability of the test to complete, I'd like to increase the
timeout to 360 seconds.

   Issue: https://bugs.openjdk.java.net/browse/JDK-8210106

Proposed fix:

diff --git a/test/jdk/sun/tools/jps/TestJps.java b/test/jdk/sun/tools/jps/TestJps.java
--- a/test/jdk/sun/tools/jps/TestJps.java
+++ b/test/jdk/sun/tools/jps/TestJps.java
@@ -27,7 +27,7 @@
   * @modules jdk.jartool/sun.tools.jar
   * @build LingeredAppForJps
   * @build LingeredApp
- * @run main/othervm TestJps
+ * @run main/othervm/timeout=360 TestJps
   */

Doesn't that then get scaled by the timeout factor resulting in a much longer timeout than the 360 seconds you intended?

For other timeout adjustments the needed time has been divided by the timeout factor to get the actual intended timeout.

This bug was filed fairly recently in Aug 2018.
At that time the timeout and timeout factor were not sufficient
to avoid the test failing.

The mach5 timeout factors were adjusted recently, so this test may
no longer be an issue.

If that is true, then we could simply close this bug as "cannot reproduce".
An argument could be made that the change in timeout factor may be
responsible for fixing a lot more of the intermittent bugs and that they
should be closed in a similar manner.

Historically, we could say this particular bug should have had timeouts
reassessed when the infrastructure switched from jprt to mach5 testing
where there were more visible Java processes running.

Using a higher explicit timeout will not make the test run any longer
than it needs. It will simply allow the test to not be terminated sooner
in a hung test scenario.

What is your preference for this particular issue:
   - increase the explicit timeout
   - close as cannot reproduce attributed to the timeout factor adjustments

What would you recommend going forward for other similar issues:
   - determine a new explicit timeout
   - close if no timeout failures have been observed since the timeout factor
      was raised

General guidance that I've been giving folks is that you bump the
default timeout when you see timeouts with 'release' bits with
a timeout factor of '1'. The default timeout factor in Mach5 is 4
so we run both 'release' and 'fastdebug' bits with that timeout
factor. That means our total timeout value for 'release' bits is
probably a little long and our total timeout value for 'fastdebug'
bits is probably just right for a concurrency factor of 12.

On my personal servers I use the following timeout factors:

        release)
            TIMEOUT_FACTOR="4"
            ;;
        fastdebug)
            TIMEOUT_FACTOR="6"
            ;;
        slowdebug)
            TIMEOUT_FACTOR="12"
            ;;

However, I run with a concurrency factor of 16. I rarely see
timeout failures.

All that is well and good, but a 'jps' test is a different
beast since it is affected by factors outside of most tests.
Since the number of java processes running on the system can
affect the test, I'm okay with using a default timeout of '360'
for this test as a guard against increases in load or...

Gary saw a longest time value of 280 seconds in his testing with
fastdebug bits. The default timeout value is 120 seconds with a
default timeout factor of 4 in Mach5 gives a total timeout of
480 seconds (8 minutes). With Gary's new default timeout of
360 seconds, we'll have a total timeout value of 1440 seconds
(24 minutes).

Before the recent changes to Mach5, the default timeout factor
was 10 so we had a total timeout value of 1200 seconds (20
minutes). Gary's change only bumps the total timeout value
by 4 minutes from what we had back in August... when this bug
was filed... unfortunately, the log for that sighting is gone
so all we know is that the test timed out after 20 minutes

> Timeout refired 1200 times

We know that JTREG timeout handling kicks in at timeout expiration,
but without the log we don't know how much longer than 20 minutes
the test ran before JTREG killed it. 24 minutes might do it, but...

Dan





Cheers,
David




Reply via email to