RFR(s) (JDK10): 8166944: Hanging Error Reporting steps may lead to torn error logs.

David Holmes david.holmes at oracle.com
Fri Feb 3 02:01:18 UTC 2017

Hi Thomas,

Thanks for picking this back up. Overall this looks okay. I have a few 
specific comments/suggestions below to add to Chris's.



Typo: canceleation



As previously discussed Atomic::load/store for jlong do not depend on 
supports_cx8. The commentary in atomic.hpp should really refer to 
'atomic read-modify-write operations on jlongs' - as every platform must 
support atomic 64-bit load/store to support Java volatile long/double. 
So set_to_now and get_timestamp can be simplified, or even removed 

Does the timestamp really need to be the system time in millis or do you 
just want to record relative elapsed time? If the latter then I suggest 
using javaTimeNanos()/100000 to avoid potential glitches with the system 
time changing.

382   // TestUnresponsiveErrorHandler: three times to trigger first a 
step timeout, then
383   // the global error reporting timeout.

I'm a little confused as to why three steps are needed to test two things??

1217     reporting_started();
1218     set_to_now(&reporting_start_time);

I think the setting of reporting_start_time should be internalized in 

Nit: Lets -> Let's  (it is a contraction of 'let us')

1272         st->print_cr("------ Timout during error reporting after " 
INT64_FORMAT "ms. ------",

Typo: timout

Also I gather we are potentially racing with the WatcherThread here, so 
no guarantee these final print statements will be seen?

1502       VMError::interrupt_reporting_thread();

Nit: you're inside a VMError function so shouldn't need the VMError prefix.



Typo: dependend

+   // Called by WatcherThread to check if the currently running error 
reporting did timeout.
+   //  Returns true if reporting did hit the global ErrorLogTimeout.

This doesn't quite read right to me. How about:

// Called by the WatcherThread to check if error reporting has timed-out.
// Returns true if error reporting has not completed within the 
ErrorLogTimeout limit



I think using @requires and runtime Platform checks is redundant. I 
prefer @requires for both debug build testing and not-windows.

On 2/02/2017 9:59 PM, Thomas Stüfe wrote:
> Dear all,
> please review (or re-review) this enhancement. It was already reviewed
> extensively for JDK9 in Oct 16, but pushed back to JDK10. Now that JDK10
> repo is open, I would like to close this issue.
> In short, this enhancement lets the hotspot cancel hanging error
> reporting steps (when writing the hs-err file), in order to allow
> subsequent error reporting steps to run. In our JVM variant, this has
> been active for a long time and has proven to make error reporting more
> stable in the face of deadlocks or file system stalls.
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8166944
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/
> Old Diskussion for JDK9:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-October/021582.html
> The webrev is unchanged from version 2 discussed in october, just got
> rebased atop of JDK10/hs.
> Thank you,
> Kind Regards, Thomas

More information about the hotspot-runtime-dev mailing list