RFR: JDK-8274320: os::fork_and_exec() should be using posix_spawn
stuefe at openjdk.java.net
Thu Oct 7 08:05:08 UTC 2021
On Sat, 25 Sep 2021 16:16:29 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
> Hi, may I have reviews for this small patch please?
> `os::fork_and_exec()`, used in the hotspot to spawn child programs (scripts etc) in error situations, should be using `posix_spawn()`.
> ATM it uses either `fork()` or `vfork()`. `vfork()` got deprecated on MacOS and we get build errors (JDK-8274293) - even though in this case it would be completely fine to use. This leaves us with `fork()` for MacOS, which has the known problems with large-footprint-parents. This matters here especially since we also use os::fork_and_exec to implement `-XX:OnError` for OOM situations.
> We already use posix_spawn() as default for Runtime.exec() since JDK 15, and it is available on all our Unices. We also should use it here.
> I kept the name of the function (fork_and_exec) since people know it, even though it's more incorrect now than before.
> - manual tests using -XX:OnError with various scripts, including checking that env variables are passed correctly
> - manually ran runtime/ErrorHandling tests
> - GHAs
> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-runtime-dev](mailto:hotspot-runtime-dev at mail.openjdk.java.net):_
> Hi Thomas,
> On 7/10/2021 4:25 pm, Thomas Stuefe wrote:
> > On Wed, 6 Oct 2021 12:48:40 GMT, David Holmes <david.holmes at oracle.com> wrote:
> > > > Therefore I think we don't lose anything by moving to posix_spawn(). But we gain reliability in high-footprint scenarios.
> > >
> > >
> > > Sorry but I have to disagree. fork() is async-signal-safe, but if an at-fork handler is not, then all bets are off - that is fine, it is the best we can do. But posix_spawn makes no claim to any kind of async-signal safety so we very much do lose something by switching to it IMO.
> > > David -----
> > Hi David,
> > I looked a bit closer, since I wanted to figure out whether the async-unsafeness of calls in error reporting actually matters. Because the problem is that these functions are not re-entrant, right? But we already have an intricate mechanism in place to guard against re-entrance errors, with our secondary signal handling.
> It's not the reentrancy problem I'm concerned with but simply the ability to "safely" call fork() from within a signal-handling context because it is marked as async-signal-safe (not withstanding that an at-fork handler may not be).
> My worry is that we may hit cases where posix_spawn just causes the VM to hang.
> I know we already take many risks in the error reporting code with regard to the ability to actually execute stuff from a signal handler, but I still feel it necessary to raise this every time something new in error reporting is proposed. This is stuff that only goes wrong live in the field after we have shipped a release.
> I understand the desire to have a more reliable forking mechanism with respect to memory management, but that has to be weighed against other reliability factors.
> I'm not aware of our existing use of fork() being flagged as causing major problems in that regard. So in my mind this change increases the risk of a hang, whilst "fixing" a problem that hasn't AFAIK really been raised as a problem.
> I'm happy to hear other opinions on this.
> Cheers, David -----
I see what you mean. Okay, hanging would be bad. Lets wait for other opinions. @fweimer do you have an opinion about whether posix_spawn is more async-unsafe than fork, at least for the glibc?
More information about the hotspot-runtime-dev