RFR 9: 8077350 Process API Updates Implementation Review
Roger.Riggs at Oracle.com
Fri Apr 17 17:05:32 UTC 2015
On 4/16/2015 3:01 PM, Thomas Stüfe wrote:
> Hi Roger,
> thank you for your answer!
> The reason I take an interest is not just theoretical. We (SAP) use
> our JVM for our test infrastructure and we had exactly the problem
> allChildren() is designed to solve: killing a process tree related to
> a specific tests (similar to jtreg tests) in case of errors or hangs.
> We have test machines running large workloads of tests in parallel and
> we reach pid wraparound - depending on the OS - quite fast.
> We solved this by adding process groups to Process.java and we are
> very happy with this solution. We are able to quickly kill a whole
> process tree, cleanly and completely, without ambiguity or risk to
> other tests. Of course we had to add this support as a "sideways hack"
> in order to not change the official Process.java interface. Therefore
> I was hoping that with JEP 102, we would get official support for
> process groups. Unfortunately, seems the decision is already done and
> we are too late in the discussion :(
It would be interesting to see a description of what you added to/around
The reason to avoid them was one of simplicity and non-interference with
spawned by native libraries. If that complexity can be understood
could fulfill a need in a scalable system.
At this point, I'd like to deal with it as a separate request for
> see my other comments inline.
> On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <Roger.Riggs at oracle.com
> <mailto:Roger.Riggs at oracle.com>> wrote:
> Hi Thomas,
> Thanks for the comments.
> On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
>> Hi Roger,
>> I have a question about getChildren() and getAllChildren().
>> I assume the point of those functions is to implement point 4 of
>> JEP 102 ("The ability to deal with process trees, in particular
>> some means to destroy a process tree."), by returning a
>> collection of PIDs which are the children of the process and then
>> killing them?
> Earlier versions included a killProcess tree method but it was
> recommended to leave
> the exact algorithm to kill processes to the caller.
>> However, I am not sure that this can be implemented in a safe
>> way, at least on UNIX, because - as Martin already pointed out -
>> of PID recycling. I do not see how you can prevent allChildren()
>> from returning PIDs which may be already reaped and recyled when
>> you use them later. How do you prevent that?
> Unless there is an extended time between getting the children and
> destroying them the pids will still be valid.
> Why? Child process may be getting reaped the instant you are done
> reading it from /proc, and pid may have been recycled by the OS right
> away and already pointing to another process when allChildren()
> returns. If a process lives about as long as it takes the system to
> reach a pid wraparound to the same pid value, its pid could be
> recycled right after it is reaped, or? Sure, the longer you wait, the
> higher the chance of this to happen, but it may happen right away.
> As Martin said, we had those races in the kill() code since a long
> time, but children()/allChildren() could make those error more
> probable, because now more processes are involved. Especially if you
> use allChildren to kill a deep process tree. And there is nothing in
> the javadoc warning the user about this scenario. You would just
> happen from time to time to kill an unrelated process. Those problems
> are hard to debug.
> The technique of caching the start time can prevent that case;
> though it has AFAIK not been a problem.
> How would that work? User should, before issuing the kill, compare
> start time of process to kill with cached start time?
See Peter's email, he described it more thoroughly that I have in
>> Note even if your coding is bulletproof, that allChildren() will
>> also return PIDs of sub processes which are completely unrelated
>> to you and Process.java - they could have been forked by some
>> third party native code which just happens to run in parallel in
>> the same process. There, you have no control about when it gets
>> reaped. It might already have been reaped by the time
>> allChildren() returns, and now the same PID got recycled as
>> another, unrelated process.
> Of course, the best case is for an application to spawn and manage
> its own processes
> and handle there proper termination.
> The use cases for children/allChildren are focused on
> supervisory/executive functions
> that monitor a running system and can cleanup even in the case of
> unexpected failures.
> All management of processes is subject to OS limitations, if the
> PID were from a completely
> different process tree, the ordinary destroy/info functions would
> not be available
> unless the process was running as a privileged os user (same as
> any other native application).
> Could you explain this please? If both trees run under the same user,
> why should I not be able to kill a process from a different tree?
I was considering the case of a different user; only the OS access
so if it was the same user the processes could be controlled.
The PH API does not provide more or less access than the OS.
>> If I am right, it would not be sufficient to state "There is no
>> guarantee that a process is alive." - it may be alive but by now
>> be a different process altogether. This makes "allChildren()"
>> useless for many cases, because the returned information may
>> already be obsolete the moment the function returns.
> The caching of startTime can remove the ambiguity.
>> Of course I may something missing here?
>> But if I got all that right and the sole purpose of allChildren()
>> is to be able to kill them (or otherwise signal them), why not
>> use process groups? Process groups would be the traditional way
>> on POSIX platforms to handle process trees, and they are also
>> available on Windows in the form of Job Objects.
>> Using process groups to signal sub process trees would be safe,
>> would not rely on PID identity, and would be more efficient. Also
>> way less coding. Also, it would be an old, established pattern -
>> process groups have been around for a long time. Also, using
>> process groups it is possible to break away from a group, so a
>> program below you which wants to run as a demon can do so by
>> removing itself from the process group and thus escaping your kill.
>> On Windows we have Job objects, and I think there are enough
>> similarities to POSIX process groups to abstract them into
>> something platform independent.
> Earlier discussions of process termination and exit value reaping
> using process groups but it became evident that the Java runtime
> needed to
> be very careful to not interfere with processes that might be
> spawned and
> controlled by native libraries and that process groups would only
> complexity and the interactions.
> Thanks, Roger
> Thanks! Thomas
More information about the core-libs-dev