RFR (S) 8181143: Introduce diagnostic flag to abort VM on too long VM operations
rkennke at redhat.com
Tue Dec 11 15:18:40 UTC 2018
this looks good to me.
> Getting back to this. Replying to all three reviews in the same mail, because they share the
> answers! New patch:
> Testing: x86_64 build, hotspot tier1, new test
> On 11/20/18 8:07 AM, Thomas Stüfe wrote:
>> I don't know whether I need 10ms resolution though - if we limit the
>> goal to catching just hanging VMOps - which would still pretty useful
>> - 1s would even be enough.
> Right. New patch implements variable delay, depending on what actual delay had been requested. New
> test makes sure it works reliably.
>> I think we can do with just two flags, since VMOperationTimeout and
>> VMOperationTimeoutDelay are redundant. Just go with the delay, make -1
>> the default that does nothing. Same could go for SafepointTimeout vs
> Redid like this:
> diagnostic(bool, AbortVMOnVMOperationTimeout, false, \
> "Abort upon failure to complete VM operation promptly") \
> diagnostic(intx, AbortVMOnVMOperationTimeoutDelay, 1000, \
> "Delay in milliseconds for option AbortVMOnVMOperationTimeout") \
> range(0, max_intx) \
>> I also agree with others that it would make more sense were we to kill
>> the VMThread instead (e.g. just with pthread_kill(SIGILL) or similar)
>> - we do something like that in error handling (see ErrorLogTimeout).
> I looked into that, and it seems too much hassle without significant benefit. For example, it would
> require drilling new platform-specific holes to support it, and it also likely to be supported on
> Linux (pthreads) only anyway? Seems to me piggybacking on periodic tasks is much simpler. In
> development use, you'd have a core file to see what is going on in VM. In "production" use you'd
> care that circuit breaker that did its job of terminating the VM.
> On 11/19/18 9:35 AM, Robbin Ehn wrote:
>> You patch seems not to be against jdk/jdk (jdk12).
> Really? The new patch is definitely against jdk/jdk.
>> Without the actual core file, I don't see the hs file being very useful
>> containing the watcherthread stacktrace. It would be good if the 'killer'
>> signaled the VM thread and we do the error reporting from the signal handler
>> instead in VM thread context.
> Right. Looked into that too, decided it is too complicated without significant benefit. See above.
> On 11/19/18 7:13 AM, David Holmes wrote:
>> You're not just introducing one diagnostic flag, your introducing the entire VM operation
>> timeout mechanism, including two product flags and one diagnostic. So the CR needs to reflect
>> that clearly and you will need a CSR request to add the two product flags. And they will need to
>> be documented.
> This was "solved" by doing two diagnostic flags, see above. I think the symmetry against other
> AbortVM* flags is important, and having the boolean flag for AbortVM* is good.
>> It's not safe to access vm_safepoint_description() outside the VMThread as the _cur_vm_operation
>> could be deleted while you're trying to access its name.
> Noted, removed that part.
>> And as we don't have a general timer mechanism this has to use polling so you pay for a 10ms
>> task wakeup regardless of how long the timeout is.
> Redone that part, see above.
>> Given the limitations of the global timeout I'm not sure I see a use for the non-aborting form.
>> This could just reduce down to:
>> otherwise I don't really think this carries its weight. Of course that's just my opinion.
>> Interested to hear others.
> I prefer not to introduce another shape for AbortVM* options. Two diagnostic options seems to have
> much less weight on the whole thing.
More information about the hotspot-dev