RFR: 8209971: TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

Ioi Lam ioi.lam at oracle.com
Thu Sep 6 07:04:18 UTC 2018

On 9/5/18 10:10 PM, Jiangli Zhou wrote:
> Hi Ioi,
> Thanks for the quick review.
> On 9/5/18 8:46 PM, Ioi Lam wrote:
>> Hi Jiangli,
>> Thanks for fixing this! I like the simple approach that minimizes 
>> code change.
>> The following comment is misleading. It suggest that mirror objects 
>> cannot be restored at this stage, but doesn't explain why.
>> Also, I don't think we should single out constant pools (the crash 
>> call stack had nothing to do with constant pool.)
>> 2039     // No mirror object is access/restored during
>> 2040     // 
>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...).
>> 2041     // Mirrors are restored after java.lang.Class is loaded.
>> 2042     //
>> 2043     // Object_klass()->constants()->restore_unshareable_info() 
>> restores the
>> 2044     // constant pool resolved_references array for Object klass. 
>> It must be
>> 2045     // done after MetaspaceShared::fixup_mapped_heap_regions().
> All constant pool resolved_references arrays for shared classes are 
> archived (like the mirrors). Restoration of the resolved_references 
> (of the Object klass) array object is done as part of the 
> Object_klass()->constants()->restore_unshareable_info(). Although it 
> wasn't shown in the backtrace of this specific crash, it can be an 
> issue as long it is done before the mapped archive regions are fixed 
> up. That's why MetaspaceShared::fixup_mapped_heap_regions() needs to 
> be called before 
> Object_klass()->constants()->restore_unshareable_info(). Any operation 
> to the archived java objects can be potentially unsafe if it is done 
> before MetaspaceShared::fixup_mapped_heap_regions(). So we want to 
> place the MetaspaceShared::fixup_mapped_heap_regions() call as early 
> as possible but after SystemDictionary::Object_klass() is set up. 
> Please let me know if adding some of these explanations would help 
> make the comments more clear.
We should have concise comments that clearly describes the issues, and 
avoid repeating things that's already in the code. So sentences like 
"Fixup the mapped archive heap regions after resolving Object_klass" 
doesn't seem that useful.

How about this:

     resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, 
     // It's unsafe to access the archived heap regions before they
     // are fixed up, so we must do the fixup as early as possible before
     // the archived java objects are accessedby function such as
     // java_lang_Class::restore_archived_mirror
     // and ConstantPool::restore_unshareable_info.
     // However, MetaspaceShared::fixup_mapped_heap_regions() fills the
     // leading empty spaces in the archived regions and may use
     // SystemDictionary::Object_klass(), so we can do this only after
     // Object_klass is resolved.
     // Note: no mirror objects are accessed/restored in the above call.
     // Mirrors are restored after java.lang.Class is loaded.

BTW, while you're fixing this area, I think we need an assert here:

void ConstantPool::restore_unshareable_info(TRAPS) {
   if (SystemDictionary::Object_klass_loaded()) {
   } else {
+  assert(pool_holder()->name() == vmSymbols::java_lang_Object(), "must 

That's because you have special handling for the Object klass, but no 
other klasses. However, the check in 
ConstantPool::restore_unshareable_info currently doesn't explicitly 
match this.

     // Initialize the constant pool for the Object_class

> Just to give some background/history information to help understand. 
> The mapping/fixing-up of the archive heap region was designed and 
> implemented as part of the shared String support. No archived String 
> objects were accessed before the region fixup operation. So this was 
> not an issue. Later on, open archive heap region was introduced and 
> archiving supports for resolved_reference arrays and mirrors were 
> implemented. That introduced the hidden issue because some of the 
> archived mirrors and resolved_references arrays were accessed early 
> (before region fixup). The issue was undiscovered because the mapped 
> region start usually was the same as the start of the GC region that 
> contains the mapped data. There was no gap at the top of the mapped GC 
> regions in normal cases. We recently started to stress testing this 
> area heavily with relocations for the default CDS archive and the 
> issue then surfaced.
>> The real reason we have to do this is:
>> + Before MetaspaceShared::fixup_mapped_heap_regions() is called, the 
>> heap is in an inconsistent state. At some G1 regions, the bottom part 
>> contains all zeros. However, some GC operations (debug-build 
>> verifications only?) would need to walk G1 regions starting from the 
>> bottom, and will run into asserts when we get an uninitialized block.
> Just want to make it clear, only the mapped archived heap regions may 
> have this issue when the mapped region start is different from the 
> containing GC region's start. None of the regular GC regions have such 
> issue.
>> + One of such operation is the barrier operations, as shown in the 
>> crash call stack in the bug report. (Maybe there are others??)
> Any operation to the archived java objects could have issue if it is 
> done before archive region fixup. That's why we want to do the archive 
> region fixup as early as possible.
>> For safety, we should put in asserts such that -- before 
>> MetaspaceShared::fixup_mapped_heap_regions() is done, we must not do 
>> anything that would trigger barrier operations. For example, we must 
>> not call oopDesc::obj_field_put, which triggered the barrier 
>> operations in the crash stack.
>> The crash is timing related and happened only when G1UpdateBufferSize 
>> was set to one. I think we need asserts so we can reliably catch any 
>> future problems. For example, if someone adds new code to modify obj 
>> fields in the archived heap before fixup_mapped_heap_regions() is 
>> called, we should get an assert immediately, instead of waiting for 
>> some future point that may or may not happen.
>> I would suggest working with the GC folks for a suitable assert.
> Not sure if there is a central place within GC code can be used for 
> the assertion, as any operation could be an issue. We can check with 
> the GC experts.
> If we want to be extra cautious, we could map the archive heap regions 
> without access permission (PROT_NONE) initially. The archive region 
> fixup code can then reset memory protection and makes it 
> readable/writable. With that, any operations to the archived java 
> objects before fixup can trigger signal. This probably should be done 
> for debugging only build and should be handled separately.

Making the memory read-only initially seems a very good idea. I don't 
think we should put it off in a future RFE. This bug happened because we 
don't have a safeguard mechanism, so the bug fix should include such a 

I spend 2 days trying to debug this issue. I would hate to have someone 
go through this again in the future just because we have filed an RFE 
but never get to doing it.

- Ioi

> Thanks,
> Jiangli
>> Thanks
>> - Ioi
>> On 9/5/18 7:58 PM, Jiangli Zhou wrote:
>>> Please review the following fix for JDK-8209971, 
>>> TestOptionsWithRanges.java crashes in CDS mode with 
>>> G1UpdateBufferSize=1.
>>>     webrev: http://cr.openjdk.java.net/~jiangli/8209971/webrev.00/
>>>     bug: https://bugs.openjdk.java.net/browse/JDK-8209971
>>> This is a pre-existing issue that's exposed when archived heap 
>>> object relocation is enabled. At CDS runtime, the archived heap 
>>> regions are mapped back to the runtime java heap. As the mapped heap 
>>> data may not occupy the entire runtime GC region(s), the empty 
>>> spaces within the regions are filled with dummy objects, which is 
>>> done by a fixup process after mapping the data. The fixup was done 
>>> after SystemDictionary::initialize(), because fixup needs to access 
>>> SystemDictionary::Object_klass() (in some cases). However that is 
>>> too late as archived java objects are accessed when 
>>> SystemDictionary::initialize() resolves well-known classes.
>>> The solution is to call MetaspaceShared::fixup_mapped_heap_regions() 
>>> at an earlier time. In the webre, the call is moved into 
>>> SystemDictionary::resolve_preloaded_classes(). It's called after 
>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...), but 
>>> before any of the archived java objects is accessed. No mirror 
>>> object is access/restored during 
>>> resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass)...). 
>>> Archived mirrors are restored after java.lang.Class is loaded.
>>> Tested TestOptionsWithRanges.java locally in CDS mode. Tier1 - tier5 
>>> normal runs and tier1 - tier4 default CDS runs are in progress.
>>> Thanks,
>>> Jiangli

More information about the hotspot-runtime-dev mailing list