RFR(S): 8008088: SA can hang the VM
rickard.backman at oracle.com
Wed Feb 13 05:57:20 PST 2013
as usual, thanks for the review.
We have reproduced the problem with thread-directed signals. However it is very unlikely that the problem happens. The thread have to come into a state where both SIG<whatever> and SIGSTOP are pending at the same time, while SA is waiting in waitpid(). The window of someone typing jmap, and SA doing the ptrace(ATTACH) pretty much
at the same time as the thread SEGVs seems pretty small.
Hope that makes sense.
On Feb 13, 2013, at 2:07 PM, David Holmes wrote:
> Hi Rickard,
> This looks good. Reviewed.
> I am a little confused about the problem though. I would expect only a process-directed signal to cause this problem (as opposed to thread-directed) - is that the case? The ptrace info doesn't make a distinction so perhaps not. Though in that case I would have expected to see this problem long ago, if for example SEGV from a null reference access could cause a stop.
> On 13/02/2013 10:28 PM, Rickard Bäckman wrote:
>> Hi all,
>> can I please have a couple of reviews of this change.
>> The problem discovered was the on Linux and BSD SA uses the ptrace() method to stop the threads before inspecting the memory of the process. However since SA doesn't check which signal was the stopping one it is possible that we stop on another signal. Doing so means that that signal is lost as well as the SIGSTOP remains as a pending signal for the process. When SA detaches, the SIGSTOP will be delivered to the process and the process will go to the Sleeping state (not executing any more code).
>> This change introduces a check on the signal, such that if the signal wasn't a SIGSTOP we restart the thread and wait for a new signal.
>> Webrev: http://cr.openjdk.java.net/~rbackman/8008088/
>> Bug: http://bugs.sun.com/view_bug.do?bug_id=8008088
More information about the hotspot-runtime-dev