On 12/11/14 7:09 AM, olivier.lagn...@oracle.com wrote:
On 11/12/2014 15:43, Dmitry Samersoff wrote:
You can set SO_LINGER to zero, in this case socket will be closed
immediately without waiting in TIME_WAIT
SO-LINGER did not help either in my case (see my previous mail to Jaroslav).
That ended-up in using another hard-coded (supposedly free) port.
Note that was before RMI tests used randomly allocated ports.

But there are no reliable way to predict whether you can take this port
or not after you close it.
This is what I observed in my case.

So the only valid solution is to try to connect to a random port and if
this attempt fails try another random port. Everything else will cause
more or less frequent intermittent failures.
IIRC think this is what is currently done in RMI tests.

The RMI tests are still suffering from this problem, unfortunately.

The RMI test library gets a "random" port with "new ServerSocket(0)", gets the port number, closes the socket, then returns the port to the caller. The caller then assumes that it can use that port as it wishes. That's when the BindException can occur. There are about 10 RMI test bugs in the database that all seem to have this as their root cause.

There is some retry logic in RMI's test library, but that's to avoid the so-called "reserved ports" that specific RMI tests use, or if "new ServerSocket(0)" fails. It doesn't have anything to do with the BindException that occurs when the caller attempts to reuse the port with another socket.

My observation is also that setting SO_REUSEADDR has no effect. I haven't tried SO_LINGER. My hunch is that it won't have any effect, since the sockets in question aren't actually going into TIME_WAIT state. But I suppose it's worth a try.

I don't have any solution for this; we're still discussing the issue. I think the best approach would be to refactor the code so that the eventual user of the socket opens it up on an ephemeral port in the first place. That avoids the open/close/reopen business. Unfortunately that doesn't help the case where you want to tell another JVM to run a service on a specific port. We don't have a solution for that case yet.

The second-best approach (not really a solution) is to open/close a serversocket to get the port, sleep for a little bit, then return the port number to the caller. This might give the kernel a chance to clean up the socket after the close. Of course, this still has a race condition, but it might reduce the incidence of problems to an acceptable level.

I'll let you know if we come up with anything better.

s'marks

Reply via email to