Daniel,

We could not be sure that the test failed of timeout, that's why I tried to add more checks.

The check for Step 1: all thread traces were printed out only if deadlock was found, and the test failed immediately.
The check for Step 2:
1) all thread traces were printed out only if the tested thread was blocked, but the test did not fail because we were not sure if deadlock happened, but the info might be helpful;
   2) otherwise only the trace of the tested thread was printed out.

In case that the test gets interrupted again by the test harness, hope we can have some useful info from these 2 checks.

It must not be so heavy but still could impact the test, your suggestion to use test.timeout.factor is a good idea, I added the code to calculate the checking time based on it:
   http://cr.openjdk.java.net/~sjiang/JDK-8050115/03/

Thanks,
Shanliang

Daniel Fuchs wrote:
Hi Shanliang,

On 9/17/14 2:19 PM, shanliang wrote:
Daniel,

The test does 2 steps of verifications, the new check is useful for the
first step, and the trace in the bug showed that the test failed on the
first step.

Yes the check might not work for the second step, I added the new code
for the second step to check the tested thread state and hope to have
useful info if the test failed on the second step.

Here is the new version:
    http://cr.openjdk.java.net/~sjiang/JDK-8050115/02/

Thanks,
Shanliang

If I understand the issue correctly - the test fails in timeout
mostly on very slow machines/configurations (fastdebug with some
combinations of options).

I worry that printing a thread dump every seconds (1000ms) is going
to make things worse: the test will spend its time printing thread
dumps instead of doing what it is supposed to do - and will have
even less CPU cycles to execute its 'real' code.

I would have advised printing the thread dumps only at the end,
when it is detected that there might be a deadlock - except that
now we can't do that since the timeout is managed completely
by the harness (so we don't get the upper hand at the end in
case of timeout).

I think depending on the harness to set the appropriate timeout
rather than depending on an arbitrary timeout set in the test itself
is the right thing to do. It's been a pattern in many tests that
failed in timeout intermittently on some slow machines/configuration.

In any case - 1s seems really too frequent.
I suppose you could inspect the system properties set by the harness
(timeout + timeout factor) to devise an acceptable frequency for
your checks - if you really want to print this info.

From the log I see that the timeout factor passed to the harness
for the slow configuration that failed is
-Dtest.timeout.factor=8.0
There's no explicit timeout given - and jtreg -onlineHelp reveals
that in this case the default timeout is two minutes.

This means that the harness has allocated 2*8=16mins for the test to
execute.
I don't think you want to take the risk of printing a thread dump
every seconds during 16 minutes ;-)

Of course I'm over simplifying here. Before your changes - the test
was deciding after 46.893 seconds that there must be a deadlock.
47s is obviously way too short for a possibly slow machine running
the test in fastdebug mode.

Something like the following might be more reasonable:

// default timeout factor is 1.0
double factor =
    Double.parseDouble(
       System.getProperty("test.timeout.factor", "1.0"));
// default timeout is 2mins = 120s.
double timeout = Double.parseDouble(
       System.getProperty("test.timeout", "120"));

// total time is timeout * timeout factor * 1000 ms
long total = (long) factor * timeout * 1000;

// Don't print thread dumps too often.
// every 5s for a total timeout of 120s seems reasonable.
// 120s/5s = 24; we will lengthen the delay if the total
// timeout is greater than 120s, so we're taking the max between
// 5s and total/24
long delayBetweenThreadDumps = Math.max(5000, total/24);

Of course 5s and total/24 are just arbitrary...
But 24 full thread dumps in a log for a single test is enough data
to analyze I think ;-)

best regards,

-- daniel



Daniel Fuchs wrote:
On 9/17/14 10:55 AM, shanliang wrote:
David Holmes wrote:
On 17/09/2014 7:01 AM, shanliang wrote:
David Holmes wrote:
Hi Shanliang,

On 16/09/2014 7:12 PM, shanliang wrote:
Hi,

Please review the following fix:

I don't see any functional change. You seem to have replaced a
built-in timeout with the externally applied test harness timeout.
Yes no functional change here, we thought that the test needed more
time
to wait a change if a testing VM or machine was really slow, the test
harness timeout was the maximum time we could give the test.

Do we have confidence that the harness timeout is sufficient to handle
the intermittent failures?
Really a good question :)

Here is new version:
    http://cr.openjdk.java.net/~sjiang/JDK-8050115/01/

I added a deadlocked check in every 1 second, hope to get more info in
case of failure.

The following comment seems to imply that this check is not
very useful:

 112             // This won't show up as a deadlock in CTRL-\ or in
 113             // ThreadMXBean.findDeadlockedThreads(), because they
don't
 114             // see that thread A is waiting for thread B
(B.join()), and
 115             // thread B is waiting for a lock held by thread A

best regards,

-- daniel


I changed also the sleep time to 100ms, 10ms seems too short as Daniel
pointed out.

Thanks,
Shanliang

Thanks,
David



Style nit: add a space after 'while' -> while (cond) {
OK, I will do it before pushing.

Thanks,
Shanliang

David
-----

bug: https://bugs.openjdk.java.net/browse/JDK-8050115
webrev: http://cr.openjdk.java.net/~sjiang/JDK-8050115/00/

Thanks,
Shanliang






Reply via email to