ProcessReaper: single thread reaper
David M. Lloyd
david.lloyd at redhat.com
Thu Apr 17 22:19:43 UTC 2014
On 04/17/2014 02:58 PM, Peter Levart wrote:
> On 04/17/2014 05:15 PM, David M. Lloyd wrote:
>> On 04/17/2014 09:43 AM, Peter Levart wrote:
>>> On 04/17/2014 09:07 AM, Martin Buchholz wrote:
>>>> Many possible solutions eventually fail because whatever we do cannot
>>>> take ownership of any global resource. Calling waitid on all child
>>>> processes, even with NOWAIT and NOHANG changes global state (what if
>>>> another subprocess library in the same process is trying to do the
>>>> same thing?)
>>> waitid(P_ALL, ..., NOWAIT | NOHANG) does not reap the child. It can be
>>> repeated multiple times. It can be used as a precursor to real
>>> waitid/waitpid which reaps a child, but only if it is "ours". The
>>> problem with this approach is what to do in the following scenario: the
>>> precursor waitid(P_ALL, ..., NOWAIT | NOHANG) returns a child that is
>>> not "ours" so we don't reap it. The "owner" of that child (JNI-library)
>>> does not do prompt reaping of their children. We loop, repeatedly
>>> getting the same child as a result, not seeing any other children that
>>> have exited in the meanwhile...
>> Maybe it would be a good idea to create a process group for
>> JDK-managed subprocesses? Otherwise, it seems that the only other
>> choice is to take over all child process management.
> This was the first idea discussed in the thread. But it's not fool-proof
> either. Parent can set the process group of a child at it's creation
> (after fork() but before execv() or as the very act of posix_spawn()),
> but the child is free to change it's group at any time after that. Such
> child "escapes" the group and waiting on the group id:
> waitpid(-pgid, ...)
> ...will never reap this child even after it exits. The escaping act is
> never reported to the parent. This usually does not happen in practice.
> Daemon processes are one of two kinds of processes that I know of, that
> change the group id, but they are grand-children of the JVM, not
> children, the other kind are processes spawned by some shell or other
> program that groups it's children in process groups to manage them so
> they are JVM grand-children too, so this is not a problem in practice, I
> think. But it's not a full-proof scheme.
Yeah definitely not. I guess I am indeed going in circles at this
point. I wonder though if you'll indulge me a bit longer, and verify my
collected understanding of the requirements of what is being requested here:
1) The process API must reap all child processes it produces that
terminate during the lifetime of the JVM, leaving no zombies (including
processes which have changed process group and/or session)
2) The process API must allow for child processes which are not managed
by it (by not attempting to reap them except as allowed by #3)
3) The process API must somehow be able to "adopt" other child processes
produced by means other than the Process API
4) The process reaper should keep resource consumption to a minimum
(preferably no more than one thread, preferably no more than one extra
FD per process)
5) The process API must provide an explicitly graceful terminate method
in addition to the existing forcible and "unspecified" destroy methods
6) The process API must provide safeguards to prevent the wrong process
from being signaled (i.e. would be required to synchronize process
reaping with termination/signaling (PID reuse probabilities
I've deliberately left off any mention of direct management of
grandchild processes. I believe it was pretty well established by Peter
Levart that a child is solely the responsibility of its parent. Martin
Buchholz has doubts about it as well. I think Roger Riggs had some
unaddressed disagreement though. For what it's worth, I agree with
Peter on this point, because I think managing grand+children makes #6
difficult or impossible to satisfy. But the topic, AFAIK, remains open.
Also I haven't brought up anything from JEP 102 that I haven't already
seen on this thread.
These requirements seem to exclude some techniques brought up on the
- waitid(P_ALL,...)/waitpid(-1,...) (which violates #2, either directly,
or by simply failing in the WNOWAIT|WNOHANG + unmanaged child process
case previously outlined by Peter Levart).
- setpgid() to an all-child process group + waitid(P_PID,...) (which
allows badly behaved processes to cause us to violate #1, and also
prevents automatic propagation of e.g. SIGTERM/SIGINT)
- setpgid() to a per-child process group (same problems, also no
workable reaping solution was found that I saw)
- SIGCHLD + siginfo (very unlikely to work consistently or correctly)
- anything relying on WNOWAIT on Mac OS X and maybe others
I think everyone liked the idea of pluggable implementations.
I didn't see this mentioned on this thread, but it seems to me that we
can have a simple 100% correct implementation on UNIX-likes by retaining
a single thread per child process (today each one has a 32k stack, maybe
it could be even smaller?). Much like the default polling
SelectorProvider for NIO, this could act as a simple fallback
implementation that will always work and be correct.
On proc-enabled systems, using poll or similar on the corresponding proc
files seems like a possible alternative implementation requiring one
additional FD per child process and only one reaper thread, since it
seems possible to meet all 6 above requirements, though lack of
standardization might add risk.
Using a single thread to iterate all child PIDs each time a SIGCHLD is
received (with WNOHANG) would work without consuming more than one
thread and zero FDs total, however it scales poorly with very large
numbers of child processes, and it might be considered a violation of #2
to use SIGCHLD anyway. Maybe these ideas could be implemented as an
alternative, contingent on -Xrs, or contingent on the previous handler
being SIG_IGN similarly to the suggestion by Martin Buchholz.
I didn't see any other workable implementation alternatives.
As for API, I had suggested that "adopted" processes have a strict
subset of functions compared to "managed" processes, and thus could be a
supertype of Process. Martin indicated that managing grandchildren
should have a different API altogether. Peter seems to lean towards
exposing the OS capabilities a bit more directly, through child process
ID enumeration (presumably including managed and unmanaged processes in
the same bucket) and an API which operates on any child process by ID,
regardless of its disposition (though I don't know of any portable API
to enumerate child processes; on Linux I believe you have to use /proc).
Peter also suggested that a process reaper be a primary internal API
Did I miss anything?
More information about the core-libs-dev