Process.exec with the linux posix_spawn mode has a bug
thomas.stuefe at gmail.com
Mon May 13 13:42:24 UTC 2019
On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <martinrb at google.com> wrote:
> I am happy this is resolved and the intermittent behavior explained. Yes,
>> we could improve exception messages, especially since analyzing fork
>> scenarios is cumbersome.
> I tried hard back in 2005 to provide pretty good java-level diagnostics
> when subprocess starting failed somehow (see WhyCantJohnnyExec) . At least
> the errno did get reported.
I know your code. For many years I wondered who Johnny is :)
We have a very similar solution in our port: we have our own error codes
(plus errno mixed in where it makes sense) for the many things that can go
wrong in the forkhelper. Maybe we can improve upon your solution a bit.
And/or add tracing for environment etc.
But here is one thing that I still do not understand with Remis problem:
The theory is that the first exec(), starting jspawnhelper, went wrong with
Man page for posix_spawn() states:
Upon successful completion, posix_spawn() and posix_spawnp() place
the PID of the child process in pid, and return 0. If there is an
error before or during the fork(2), then no child is created, the
contents of *pid are unspecified, and these functions return an error
number as described below.
Even when these functions return a success status, the child process
may still fail for a plethora of reasons related to its pre-exec()
initialization. In addition, the exec(3) may fail. In all of these
cases, the child process will exit with the exit value of 127.
To me this looks as if what should have happened is: posix_spawn() should
have returned with success, since the fork() went thru. Then, the child
process (still inside posix_spawn()) attempts exec and gets a NOACCESS.
Then, child process should have ended with exit code 127. Your fail pipe
would never read an error code since we never entered the main function of
jspawnhelper. For the java caller it should have looked like a very short
lived process with exit code 127.
Obviously this is not what happened, since Remi reported an IOException
with an errno. So, where do I understand this wong?
I've had this little script around for ages:
> # -v: Print unabbreviated versions of environment, etc
> exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@"
We had all this as part of spawn traces. But this is a nice and neat idea.
Does it print current directory?
More information about the core-libs-dev