ProcessReaper: single thread reaper

Roger Riggs Roger.Riggs at
Tue Mar 25 16:25:12 UTC 2014

Hi Peter,

Thanks for the comments, I'll fix the races conditions, I intended
to go back to the dedicated thread but wanted to check for scalability
in the case of a large number of threads and non-trivial termination 
where the streams had to be drained.

Thanks, Roger

On 3/25/14 7:05 AM, Peter Levart wrote:
> On 03/24/2014 10:05 PM, roger riggs wrote:
>> Hi Rob, Martin,
>> I've prototyped (for 9) a thread reaper[1] that uses a single thread 
>> to wait for exiting
>> processes and calling back to the process with the exit status.
>> the interesting part is getting the exit status back to the Process 
>> that needs it
>> It needs more testing and hardening.
>> I had not considered using a signal handler for SIGCHLD but that's an 
>> option,
>> though we need to be very careful about thread usage.
>> Roger
>> p.s. comments on the single thread reaper appreciated (in a new thread)
>> [1]
> Hi Roger,
> I think I found a little race. Suppose reaper task is still alive and 
> that all consumers have been serviced (consumerCount is 0). The reaper 
> task waits for 500 millis for next consumer to be registered, but 
> times out. Before calling "reaperThread.release()", new consumer comes 
> around and registers itself, also calling runReaper(), but since 
> reaperThread.release() has not yet been called by old reaper task, new 
> reaper task is not submitted to commonPool. The old reaper task 
> finishes, leaving one consumer on the waitingList with no reaper task 
> to service it. If no new consumers get registered, the waiting 
> consumer will never be notified...
> The simplest solution for this race, I think, would be to have a 
> dedicated long-running thread. It could be spawned lazily, but then it 
> would never finish.
> Otherwise a nice solution with two lists (exitList/waitList) and 
> avoidance of race with reversed orders between
> - consumer registration: register on waitList 1st then check exitList, 
> and
> - exit event dispatch: register on exitList 1st then check waitList
> ...but the check you use with consumeCount local variable to detect 
> processes spawned by other means (for purposes of logging only) has a 
> race:
> thread1: Suppose a new consumer is being registered with 
> onExitCall(...), is added on the waitList, but before checking 
> exitList().size() and iterating the exitList, ...
> thread2: the reaper task detects that the very same process has 
> finished (gets it's pid from waitpid) and adds it's pid to exitList. 
> Then before iterating waitList, ...
> thread1: iterates the exitList, finds a match and consumes the pid, 
> removing the matching entries from both exitList and waitList. Then ...
> thread2: the reaper task iterates waitList, doesn't find a matching 
> entry for exitPid, doesn't increment consumeCount and voila: debug 
> log("Unexpected process exit for pid:...").
> That's enough races for today.
> Regards, Peter
>> On 3/24/2014 12:38 AM, Rob McKenna wrote:
>>> Hi folks,
>>> Roger Riggs (cc'd) may want to chip in here as he's looking at the 
>>> reaper thread arrangement in 9 at the moment.
>>> On another note, I too support the merging of those files. I didn't 
>>> think there was much appetite for it at the time so I must admit 
>>> this fell down my todo list. Looking at this bug did remind me that 
>>> its something worth trying though. As per Alan's mail, I'm going to 
>>> tackle it separately if you folks don't mind. I'll have a look at 
>>> Peter's changes (thanks Peter!) as soon as I can and see about 
>>> getting them in.
>>>     -Rob
>>> On 23/03/14 22:30, Martin Buchholz wrote:
>>>> On Sun, Mar 23, 2014 at 2:34 AM, Martin Buchholz 
>>>> <martinrb at <mailto:martinrb at>> wrote:
>>>>     We have also thought about whether having reaper threads is
>>>>     necessary.  The Unix rule is that child processes should be
>>>>     waited for, and some thread needs to do that.  There's no way
>>>>     to wait for a set of child pids, or to specify a "completion
>>>>     handler".  Well, you might be able to get the newish waitid()
>>>>     to do what you want, but it looks like it's not sufficient when
>>>>     java is running inside a process that might do independent
>>>>     subprocess creation outside of the JVM.
>>>> Actually, I take it back.  With sufficient work, it looks like you 
>>>> can get SIGCHLD to give you pid information in siginfo_t si_pid, 
>>>> and that can be used to trigger the reaping.  It looks like waitpid 
>>>> is "async-signal-safe", so we can call it from our signal handler.
>>>> While we're at it we can fix SIGCHLD handling to do signal 
>>>> chaining, as with other signals.

More information about the core-libs-dev mailing list