Runtime.exec : vfork() concerns and a fix proposal
thomas.stuefe at gmail.com
Wed Sep 12 07:04:04 UTC 2018
On Tue, Sep 11, 2018 at 8:29 PM, David Lloyd <david.lloyd at redhat.com> wrote:
> I think this is a cool idea.
Thanks. I think I did not come up with it though, I think the
technique was known already.
> Do you have any performance numbers?
small program, just spawning off /bin/true a 1000 times, measured on
my t450s running Ubuntu 16.4:
Number open files: <none> 1000 100000
openjdk8: 305ms 1.5s 115s
sapjvm8: 721ms 2.3s 142s
factor 2.4 1.53 1.23
So, it starts off with factor 2.3, but penalty diminishes with the
number of open files. This comparison is a imprecise however since we
compare different JVMs with completely different Runtime.exec()
implementations. We do more checks in our JVM, which may mean more
syscalls per fork().
> On Tue, Sep 11, 2018 at 12:52 PM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>> Hi all,
>> I wanted to gauge opinions on the following issue:
>> Runtime.exec, on Linux, uses vfork(2) by default. It gives us better
>> performance compared with fork() and robustness in constrained memory
>> But as we know vfork() can be dangerous if used incorrectly. In the
>> child process before exec'ing, we live in the memory of the parent
>> process. If we are not very careful we can influence or crash the
>> parent process.
>> According to POSIX pretty much the only thing the child process is
>> allowed to do after vfork(2) is to exec(3) immediately; if that fails,
>> you must call _exit(2).
>> However, in the openjdk we do a number of things beyond that:
>> - stdin,out,err pipe handling business
>> - closing all file descriptors
>> - we change the working directory
>> - we may actually modify errno manually
>> - in case exec fails, we communicate the error back to the parent using pipe.
>> This involves calling a number of libc functions beyond exec(), namely
>> read, close, dup2, opendir/readdir, write, chdir... It also needs a
>> bit of stack, since we assemble path names.
>> I was curious whether there were any real issues, so I tested (on
>> Ubuntu 16.4) and found:
>> 1) A crash - any crash - in the child process before exec() will kill
>> the parent jvm dead. Weirdly enough, we do not even enter our error
>> handling, but seem to die instantly with the default "Segmentation
>> 2) Signals received by the child process before exec() influence the
>> parent process. For example:
>> - SIGINT set to the child ends both parent and child, immediately
>> - SIGABRT aborts both child and parent
>> - any error signal sent to the child lead to the behavior described at (1)
>> 3) A stack overflow in the child before exec() also kills the parent.
>> Unsurprising, since guard page hit -> segfault -> see (1).
>> 4) more amusing, setting errno in the child before exec() changes the
>> errno in the parent process. propagates to the parent process.
>> But since errno is thread local and the thread in the parent process
>> is waiting in vfork() and will, upon return, not look at errno (after
>> all, vfork succeeded) this causes no trouble.
>> There may be more issues, but these were the ones I tested.
>> In all cases I counter-tested with fork() instead of vfork() and as
>> expected with fork() the parent process stays unaffected as it should
>> Whether you think these issues are worth solving is an open question.
>> All these cases may happen in the wild (well, apart from
>> crash-by-programming-error if one assumes the program to be really bug
>> free) albeit with a very small probability. But once these bugs occur,
>> they can be very difficult to analyse. So fixing this may be
>> At SAP, we opted for robustness, so we changed the Runtime.exec()
>> implementation to deal with vfork() issues. Basically, we employ the
>> exec-twice technique:
>> - in the child, after the vfork(), we immediately exec() into a little
>> bootstrap binary ("forkhelper").
>> - Now we are safe in the sense that we do not share memory with the
>> parent process anymore
>> - Then, parent process communicates with the child via pipes and gives
>> it all information needed to do the "real" exec: environ, current dir,
>> arguments... .
>> - Now the child exec's a second time, this time into the real target binary.
>> The point of this technique is that we minimize the window in the
>> child between vfork and the first exec. In fact, we are now fully
>> POSIX compliant. This solves the described pathological cases.
>> It has some other advantages too, e.g. allowing for better error
>> handling and tracing in the Runtime.exec() area. Performance-wise it
>> depends: we exec twice, so we pay twice. However, since the parent
>> continues execution after the first exec, it spends less time waiting
>> on the child process, which can make a difference if there are many
>> file descriptors open.
>> Checking opinions here. Do you think we are okay with our current
>> implementation or would a change as described above be welcome in the
>> OpenJDK too?
>> Thanks, and Best Regards, Thomas
> - DML
More information about the core-libs-dev