On Wed, 23 Feb 2022 13:59:19 GMT, Kevin Walls <kev...@openjdk.org> wrote:
> Test fails occasionally due to a port clash. > Presumably the port that was returned by Utils.getFreePort(), is no longer > free. > The test creates a ProcessBuilder with the parameters for JMX, including port > number, and uses that to create a new Process. > It should retry with a new port if we fail due to a port in use, for some > limited number of attempts. > > main already has some retry logic, but not working: > it checks for an InvocationTargetException to contain a BindException, but it > simply gets a BindException, thrown by TestAppRun.start(). > TestAppRun.start() runs the new process and scans for errors, but on failure > its predicate has only seen the first line of a failure, so a BindException > is never recognised and thrown. > Also main does not limit the retries, and handling the port retry in main() > is duplicated, for each run of the test method. > > So... > > Make the error-scanning predicate in TestAppRun recognise a "port in use" > message and throw a BindExeption. This is a notification to the caller that > it failed, it's not the actual BindException as that was thrown in a > different process. > Make the testDefaultAgent method (the main part of the test) handle retrying > with a new port, a limited number of times. This is reasonable approach to take and should ameliorate the BindException failure condition. But, as it is using the same port allocation strategy which is in current use, then it will be still be suspect to intermittent BindException failures. This will be especially true, as with the current failures, there is a lot of network testing activity with concurrent network test execution and the MACH5 test framework itself e.g. logstash etc. An alternative is to use a fixed port 1098, which previously was the default for RMI Activation daemon. This is now available as activation has been removed from the java platform in JDK17. ------------- PR: https://git.openjdk.java.net/jdk/pull/7589