[14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation

Stefan Johansson stefan.johansson at oracle.com
Fri Jan 17 09:06:50 UTC 2020


Hi Thomas,

On 2020-01-16 14:20, Thomas Schatzl wrote:
> Hi all,
> 
>    can I get reviews for this change that fixes a bug in the abortable 
> mixed gc algorithm where G1 might corrupt oops embedded in nmethods due 
> to parallel modification during an optional evacuation phase?
> 
> G1 currently collects embedded oops in nmethods twice: once in the 
> optional roots list, and once as nmethods in the strong code roots list 
> for a particular region.
> 
> Now it can happen that this oop embedded in in the code stream is 
> unaligned, so if that oop is modified during relocation word tearing may 
> occur, causing follow-up crashes.
> 
> The fix is to not collect oops from nmethods in the optional code root 
> list as the strong code root list for a particular region already always 
> contains it anyway.
> 
> Thanks go to stefank, eriko and sjohanss for helping with analyzing, 
> testing and the discussion around it.
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8235305
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/

Fix looks good. Just some things around the naming of the template 
parameter and enum after adding this. I don't have a much better idea 
but I don't think "barrier" is exactly what this is. I do think it would 
make sense to call the new value G1BarrierNMethod to be more inline with 
the other names. I also think it would make sense to move the comment 
about why this is needed to where we use it in g1OopClosures.inline.hpp. 
Me and StefanK talked a bit about this and if we move the comment and do 
the check for the barrier as a separate if-statement, it should be more 
obvious when this is needed.

Thanks,
Stefan

> Testing:
> multiple runs of hs-tier1-5, multiple runs of the crashing application 
> (24h kitchensink) with and without a VM modification and also with some 
> G1 settings that caused crashes within 1-2 hours that reproduced the 
> issue within 5 minutes.
> Currently starting perf test runs with and without this change: however 
> since this change strictly reduces the work done at all times I am not 
> expecting any regressions (and hence I am asking for review in advance).
> 
> Thanks,
>    Thomas


More information about the hotspot-gc-dev mailing list