RFR: runtime/8176717/TestInheritFD.java fails with java.lang.RuntimeException: could not match: VM RESULT => RETAINS FD
david.holmes at oracle.com
Wed Jun 6 05:40:06 UTC 2018
First, please include the bug id in the RFR email subject line - thanks.
On 31/05/2018 11:56 PM, Leo Korinth wrote:
> I am uppdating TestInheritFD.java to fix intermittent test
> failures. The reason is that
> /sys/fs/cgroup/memory/memory.limit_in_bytes sometimes is open.
> The new test does not count the number of open file descriptors
> but instead looks to see if the log file is still open; this is
> less fragile. This relies on a /proc file system, but that seems
> to be available on all non Windows ports except Mac. Mac testing is
> disabled. The Windows solution does not need to change.
Let me back up a step to understand the original problem and the test.
Problem: if a VM was started with a log file, the fd for the log file
would be inherited by an exec'd process (ie another VM).
Detection: On Windows the original log file could not be deleted due to
the child process's open fd.
Fix: use close-on-exec for the log file
- On Windows try to rename the log file - should succeed.
- On non-windows get the count of open fds in the parent process and
child process and check child-count < parent-count as child should not
have the log file.
The flaw: On Linux, due to the container detection logic the child can
still have an open /sys/fs/cgroup/memory/memory.limit_in_bytes and so
the count is not smaller than the parent.
Question: why does the parent not also have the open
The basic problem for the test is: how it can determine that one process
has an open fd for the same file as another process? The Windows
situation answers that accurately. Counting fd's doesn't. But the other
platforms don't have the rename issue that windows does. And there's no
file API to ask this kind of question as far as I can see.
So the proposed fix is to use the /proc filesystem to look in the open
fd's for the file. This seemingly works on all non-windows platforms
except OS X. Do we ever see the intermittent failures on OS X? If not
can it keep using the fd counting logic?
Thanks for bearing with me as I walked through that :)
> This test bug does cause intermittent test failures in tier1, so I would
> be grateful for reviews and also someone to sponsor the final hg export.
So generally the approach seems good. Took quite an effort to translate
all the stream/lambda processing though :)
+ return new File("Error: could not read symbolic link (last
link can not be read as it points to the /proc/self/fd directory that
has been closed): " + e.toString());
That is a very long line and it seems quite bizarre to "return" an error
as a File object. Took me a while to realize this is not really an error
but the way we detect the last entry which is unreadable. I suppose this
is one way to get all the information printed out.
+ System.out.println("Open file descriptors:\n" +
+ .map(f -> "###" + f.getAbsolutePath() + " -> " +
I found this really hard to parse and understand. It would help a little
if the " -> " string was written as " maps to " so it didn't look like
the -> from a lambda expression (or even just =>). It would also help to
unwrap one level and assign to a String which is then passed to println.
+ return stream(dir.listFiles())
+ .filter(f -> f.getName().endsWith(LOG_SUFFIX))
Three minor nits:
133 + " ON as files in /proc and /sys is opened by the JVM");
Typo: is opened -> are opened
Spaces around ? and : operators
> GNU/Linux, Solaris, Windows and Mac (Mac disabled).
More information about the hotspot-runtime-dev