RFR(S): 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries
volker.simonis at gmail.com
Tue Nov 29 13:49:25 UTC 2016
thanks a lot for the quick review!
although self-evident, I would nevertheless prefer to let the comment
in just in order to increase the awareness for non-TSO architectures.
But as I'd also need a sponsor (and kindly ask you hereby to be one :)
I'm fine to leave the ultimate decision to him :)
Notice that the fix was contributed by gunter.haug at sap.com so please
leave the "Contributed-by:" field in place.
On Tue, Nov 29, 2016 at 11:43 AM, Thomas Schatzl
<thomas.schatzl at oracle.com> wrote:
> On Mon, 2016-11-28 at 19:33 +0100, Volker Simonis wrote:
>> can I please have a review and sponsor for the following fix
>> by gunter.haug at sap.com:
>> We've observed a crash (see bug report for a stack trace) in
>> CardTableModRefBSForCTRS::process_chunk_boundaries() from time to
>> time since several years now, but only on non TSO platforms:
>> - It only happens in opt builds.
>> - Analysis of the assembly code revealed the actual crash site to be
>> an array store to a pointer (_lowest_non_clean) which is an argument
>> to process_chunk_boundaries()
>> - The pointer is actually calculated in
>> CardTableModRefBS::get_LNC_array_for_space() and passed as argument
>> - CardTableModRefBS::get_LNC_array_for_space() doesn't enforce TSO on
>> _last_LNC_resizing_collection[i] so the pointer to an uninitialized
>> structure (i.e._lowest_non_clean) could become visible to other
>> threads before the value of _last_LNC_resizing_collection[i].
>> Use OrderAccess::load_acquire and OrderAccess::release_store for
>> accessing _last_LNC_resizing_collection[i] in
> looks good to me. Nice catch.
> During my review I did look at other uses of
> _last_LNC_resizing_collection and friends, aparently no issues there.
> I also looked at what parallel gc does, and while it implements the
> same idea of slicing the card table in chunks for parallel card table
> scan, it does not need or use the LNC tables (of course). That
> implementation also does not seem to use any similar shared data
> structure, so it should be good (in
> One comment: maybe the "Therefore use acquire/release to guarantee this
> in non TSO architectures." sentence in the added comment could be
> Adding appropriate code so that collectors work on supported
> architectures seems to be an obvious thing to do at this point, and
> should not need to be argued for.
> However you can leave that sentence in if you really want. I do not
> need a re-review if you removed it before pushing either.
More information about the hotspot-gc-dev