[PATCH] JDK-8228580 DnsClient TCP socket timeout
pavel.rappo at oracle.com
Tue Sep 3 14:33:15 UTC 2019
There is no reason to wait until you can post your own RFR, this needs to be addressed separately and I've taken some steps to fix it. Here's your latest patch (please correct me if I'm wrong) as a webrev:
Let's discuss it.
As I said before I'm not yet concerned with the issues that are already present in the UDP case (integer overflow, use of currentTimeMillis, negative timeouts, etc.) so I'd suggest we focus on the test first.
1. I've run the test a couple of hundred times and observed some infrequent failures (the test timed out). Here's an excerpt from the associated thread dump:
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base at 14-internal/Native Method)
- waiting on <0x00000000e086d3a8> (a java.lang.Thread)
at java.lang.Thread.join(java.base at 14-internal/Thread.java:1303)
- locked <0x00000000e086d3a8> (a java.lang.Thread)
at java.lang.Thread.join(java.base at 14-internal/Thread.java:1371)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base at 14-internal/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base at 14-internal/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base at 14-internal/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base at 14-internal/Method.java:564)
at java.lang.Thread.run(java.base at 14-internal/Thread.java:830)
I'm not sure yet how that thread ended up being blocked in this place for so long. One possibility would be if `serverSocket` did not react on `close()` (L106). Which is highly unlikely. What is more likely though is that `serverSocket` was found (L105) to be null and thus was never closed. That makes sense given that this field is accessed from more than one thread without sufficient synchronization. I don't know whether that was indeed the case, but we'd better address that issue to rule out that possibility in the future.
2. The way ServerSocket is created on L91 is troublesome for two reasons. The first one is that we don't know which interface this server socket will be bound to. I'd better use
serverSocket = new ServerSocket(port, 0, InetAddress.getLoopbackAddress());
which is what UDP DNSServer does, albeit implicitly.
Secondly, even though the UDP server got its port from the ephemeral port range, we can't really guarantee that this same port will be simultaneously available for TCP. There are a couple of solutions for this issue. One would be to make the test run inapplicable by throwing jtreg.SkippedException.
3. L34, typo: UDH DNS server; L57, typo: the reponse is
> On 2 Sep 2019, at 18:58, Milan Mimica <milan.mimica at gmail.com> wrote:
> On Mon, 2 Sep 2019 at 17:45, Pavel Rappo <pavel.rappo at oracle.com> wrote:
>>> On 21 Aug 2019, at 09:27, Milan Mimica <milan.mimica at gmail.com> wrote:
>>> I don't know why main/timeout=20 does not work for me.
>> What do you mean exactly by "does not work for me"? Are there any associated traces/logs from jtreg? Can you post them?
> I would expect the test to fail after 20 seconds (without the fix). It
> seems to fail after 80 seconds or so, I just didn't have the patience
> previously to wait for that long. Here is the log  if you want to
> take I look. However it's probably not a big deal, IMO.
>  https://pastebin.com/uUbeXfUf
> Milan Mimica
More information about the core-libs-dev