RFR: 8006534 CLONE - TestLibrary.getUnusedRandomPort() fails intermittently-doesn't retry enough times
darryl.mocek at oracle.com
Fri Jan 18 23:11:29 UTC 2013
I think this change looks fine. I think it works for our purposes
for now and is a quick and painless change.
The problem as I understand it is that when getUnusedRandomPort is
called and the port returned is FIXED_PORT_MIN (which is rejected), then
the code tries again and returns FIXED_PORT_MIN + 1, then +2, etc.
Since FIXED_PORT_MAX - FIXED_PORT_MIN >= numTries, it fails. The fix
increases the number of tries. This means when FIXED_PORT_MAX -
FIXED_PORT_MIN increases above the current numTries, then we'll need to
increase the number of tries again. This probably won't happen for a
while as we're trying to decrease the use of fixed ports, not increase
them. However, there are a bunch of tests which currently use fixed
ports which aren't using TestLibrary, but should be because we can
probably convert most or all of them to use a random port. Those we
cannot convert should have their fixed ports reserved in TestLibrary.
After some thought, I think we can improve "getting an unused random
port by providing a custom SocketImplFactory which will not use
ServerSocket (as we open the ServerSocket, get the port number, close
the ServerSocket, return the port number...btw, it is unlikely but
possible that this port will be grabbed by some other process between
the time the ServerSocket is closed and the time the socket is opened by
the caller of getUnusedRandomPort, but this may happen with a custom
SocketImplFactory anyway) and which only look for ports whicha re not
used and which are within a range we want (read outside the FIXED_PORT_*
range (and other ranges like < 1024)). I think using a SocketImplFactory
will improve the performance and reliability of the tests.
On 01/17/2013 12:18 PM, Jim Gish wrote:
> Please review
> TestLibrary.getUnusedPort() attempts to retry getting an ephemeral
> port 10 times, which is the current number of ports in the reserved
> port range. Debugging code added a few weeks back has revealed that
> intermittent failures in tests using this method are caused when the
> underlying socket code first returns the first number in the reserved
> port range. Subsequent retries result in incrementing the port number
> by one until reaching the max reserved port number, which just happens
> to correspond to the 10th attempt, and the code then quits trying.
> The proposed fix (one of many possible alternatives, of course) simply
> retries for twice the number of ports in the range.
More information about the core-libs-dev