ProcessReaper: single thread reaper
roger.riggs at oracle.com
Mon Apr 14 14:37:10 UTC 2014
Jtreg, for example, needs a reliable way to cleanup after tests.
We've had a variety of problems with stray processes left over because
there is no visibility nor reliable way to identify and kill them.
On 4/14/2014 10:31 AM, David M. Lloyd wrote:
> Where does the requirement to manage grandchild processes actually
> come from? I'd hate to see the ability to "nicely" terminate
> immediate child processes lost just because it was difficult to
> implement some grander scheme.
> On 04/14/2014 08:49 AM, roger riggs wrote:
>> Hi Martin,
>> A new API is needed, overloading the current Process API is not a good
>> Even within Process a new method will be needed to destroy the
>> subprocess and all
>> of its children maintain backward compatibility.
>> Are there specific OS features that need to be exposed to applications?
>> Is the destroy-process-and-all-children abstraction too coarse.
>> On 4/11/2014 7:37 PM, Martin Buchholz wrote:
>>> Let's step back again and try to check our goals...
>>> We could try to optimize the one-reaper-thread-per-subprocess thing.
>>> But that is risky, and the cost of what we're doing today is not that
>>> We could try to implement the feature of killing off an entire
>>> subprocess tree. But historically, any kind of behavior change like
>>> that has been vetoed. I have tried and failed to make less
>>> incompatible changes. We would have to add a new API.
>>> The reality is that Java does not give you real access to the
>>> underlying OS, and unless there's a seriously heterodox attempt to
>>> provide OS-specific extensions, people will have to continue to either
>>> write native code or delegate to an OS-savvy subprocess like a perl
>>> On Fri, Apr 11, 2014 at 7:52 AM, Peter Levart <peter.levart at gmail.com
>>> <mailto:peter.levart at gmail.com>> wrote:
>>> On 04/09/2014 07:02 PM, Martin Buchholz wrote:
>>>> On Tue, Apr 8, 2014 at 11:08 PM, Peter Levart
>>>> <peter.levart at gmail.com <mailto:peter.levart at gmail.com>> wrote:
>>>> Hi Martin,
>>>> As you might have seen in my later reply to Roger, there's
>>>> still hope on that front: setpgid() + wait(-pgid, ...) might
>>>> be the answer. I'm exploring in that direction. Shells are
>>>> doing it, so why can't JDK?
>>>> It's a little trickier for Process API, since I imagine that
>>>> shells form a group of processes from a pipeline which is
>>>> known in-advance while Process API will have to add processes
>>>> to the live group dynamically. So some races will have to be
>>>> resolved, but I think it's doable.
>>>> This is a clever idea, and it's arguably better to design
>>>> subprocesses so they live in separate process groups (emacs does
>>>> that), but:
>>>> Every time you create a process group, you change the effect of a
>>>> user signal like Ctrl-C, since it's sent to only one group.
>>>> Maybe propagate signals to the subprocess group? It's starting
>>>> to get complicated...
>>> Hi Martin,
>>> Yes, shells send Ctrl-C (SIGINT) and other signals initiated by
>>> terminal to a (foreground) process group. A process group is
>>> formed from a pipeline of interconnected processes. Each pipeline
>>> is considered to be a separate "job", hence shells call this
>>> feature "job-control". Child processes by default inherit process
>>> group from it's parent, so children born with Process API (and
>>> their children) inherit the process group from the JVM process.
>>> Considering the intentions of shell job-controll, is propagating
>>> SIGTERM/SIGINT/SIGTSTP/SIGCONT signals to children spawned by
>>> Process API desirable? If so, then yes, handling those signals in
>>> JVM and propagating them to current process group that contains
>>> all children spawned by Process API and their descendants would
>>> have to be performed by JVM. That problem would certainly have to
>>> be addressed. But let's first see what I found out about
>>> sigaction(SIGCHLD, ...), setpgid(pid, pgid), waitpid(-pgid, ...),
>>> waitpid(-pgid, ...) alone seems to not be enough for our task.
>>> Mainly because a process can re-assign it's group and join some
>>> other group. I don't know if this is a situation that occurs in
>>> real world, but imagine if we have one live child process in a
>>> process group pgid1 and no unwaited exited children. If we issue:
>>> waitpid(-pgid1, &status, 0);
>>> Then this call blocks, because at the time it was given, there
>>> were >0 child processes in the pgid1 group and none of them has
>>> exited yet. Now if this one child process changes it's process
>>> group with:
>>> setpgid(0, pgid2);
>>> Then the waitpid call in the parent does not return (maybe this is
>>> a bug in Linux?) although there are no more live child processes
>>> in the pgid1 group any more. Even when this child exits, the call
>>> to waitpid does not return, since this child is not in the group
>>> we are waiting for when it exits. If all our children "escape" the
>>> group in such way, the tread doing waiting will never unblock. To
>>> solve this, we can employ signal handlers. In a signal handler for
>>> SIGCHLD signal we can invoke:
>>> waitpid(-pgid1, &status, WNOHANG); // non-blocking call
>>> ...in loop until it either returns (0) which means that there're
>>> no more unwaited exited children in the group at the momen or (-1)
>>> with errno == ECHILD, which means that there're no more children
>>> in the queried group any more - the group does not exist any more.
>>> Since signal handler is invoked whith SIGCHLD being masked and
>>> there is one bit of pending signal state in the kernel, no child
>>> exit can be "skipped" this way. Unless the child "escapes" by
>>> changing it's group. I don't know of a plausible reason for a
>>> program to change it's process group. If a program executing as
>>> JVM child wants to become a background daemon it usually behaves
>>> as follows:
>>> - fork()s a grand-child and then exit()s (so we get notified via
>>> signal and waitpid(-pgid, ...) successfully for it's exitstatus)
>>> - the grand-child then changes it's session and group (becomes
>>> session and group leader), closes file descriptors, etc. The
>>> responsibility for waiting on the grand-child daemon is
>>> transferred to the init process (pid=1) since the grand-child
>>> becomes an orphan (has no parent).
>>> Ignoring this still unsolved problem of possible ill-behaved child
>>> program that changes it's process group, I started constructing a
>>> proof-of-concept prototype. What I will do in the prototype is
>>> start throwing IllegalStateException from the methods of the
>>> Process API that pertain to such children. I think this is
>>> Stay tuned,
More information about the core-libs-dev