Hi,

Please find below a fix for:

8249786: java/net/httpclient/websocket/PendingPingTextClose.java
         fails very infrequently
https://bugs.openjdk.java.net/browse/JDK-8249786

webrev:
http://cr.openjdk.java.net/~dfuchs/webrev_8249786/webrev.00/

This test has been observed failing on windows in the JDK 15 CI.

The most troublesome issue is that the test was producing so
much output that the actual reason for the failure was lost
in the output overflow.

After instrumenting the test to limit the output and
adding a few higher level traces, I was able to reproduce
once (out of 250 runs) and see the actual stack trace.
The test fails just after calling websocket.abort() when it tries
to verify that the cfPing CompletableFuture completed
exceptionally, and found that it actually successfully completed.

The logic of the test is to try to fill up the local send buffer
by sending ping messages, so that an attempt to write to the
socket will block.
For that it creates a server that will accept a websocket
connection, but will not read anything from the socket input.

The client side sends ping packets until the socket buffer
is full - which is detected by setting up a 10s timeout and
observing that the ping data could not be written during
this time. The assumption of the test is that a write call
that takes more than 10s is indicative that the buffers are
full, and will never succeed.

The problem occurs when the write succeeds after ~10s either
because the kernel was busy doing some other things, or because
the kernel suddenly decided to resize (increase) the buffers,
which causes the write call to unblock and succeed after 10s.

The test already had some provision and a workaround for that
issue - via a repeatable( ) operation - but the workaround
was only enabled for macOS where such behavior had first been
observed.

The fix extends that workaround for Windows - since the later
failure shows that something similar is happening there.
The fix also moves the websocket.abort() and following check
inside the repeatable loop for better reliability.

Since pushing test fixes during rampdown 2 is permitted,
and since the failure was observed in the JDK 15 CI, I'm
planning to push this test fix to the jdk15 repo,
unless I hear any objection.

best regards,

-- daniel

Reply via email to