RFR(L) 8244778 Archive full module graph in CDS

Ioi Lam ioi.lam at oracle.com
Mon Aug 31 15:31:36 UTC 2020

Hi Lois,

Thanks for the review. Your comments has led me to discover a couple of 
pretty serious issues, which hopefully I have fixed. Also the code is 
cleaner now, with much fewer control-flow changes than the last webrev.


Please see my comments in-line.

On 8/25/20 7:58 AM, Lois Foltan wrote:
> Hi Ioi,
> Changes looks really good.  Comments interspersed below.
> Thanks,
> Lois
> On 8/12/2020 6:06 PM, Ioi Lam wrote:
>> Hi Lois,
>> Thanks for the comments. I have an updated webrev
>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02/ 
>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02.delta/ 
>> Here are the general notes on the changes. Replies to your questions 
>> are in-line below.
>> (1) Integrated updates in the Java code from Alan Bateman. Here are 
>> Alan's
>>     notes:
>>     The archive of the boot layer is as before, except that archiving is
>>     skipped if there are split packages or incubator modules. Incubating
>>     modules aren't resolved by default so we shouldn't have them in the
>>     boot layer by default anyway.
>>     I've dropped the module finders from the boot layer archive. I've
>>     left the IllegalAccessLogger.Builder in the acrhive for now (even
>>     though it is not the boot layer). We should be able to remove that
>>     once the JEP to disallow illegal access by default is in.
>>     Related is that I don't like the archive for the module graph
>>     (ArchivedModuleGraph, pre-dates this RFE) including the set of
>>     packages to export/open for illegal access as they aren't part
>>     of the module graph. I've left it for now but we can also remove
>>     that once we disallow illegal access by default (as those sets
>>     will be empty).
>>     The archive of built-in class loaders is now in one object
>>     jdk.internal.loader.ArchivedClassLoaders which I think will be a
>>     bit more maintainable. I've also drop the ucp field from the
>>     AppClassLoader as the changes to BuiltinClassLoader means is no
>>     longer needs to duplicated.
>>     There's one remaining issue in ModuleBootstrap.boot where it has fix
>>     an app class loader value (ModuleLayer.CLV). Ideally the 
>> initialization
>>     of the built-in class loaders would do this but we are kinda forced
>>     to separate the archiving of the built-in class loaders from the 
>> boot
>>     layer. I might look at this again some time.
>> (2) Moved code from classLoaderData.cpp -> classLoaderSharedData.cpp
>> (3) Reverted unnecessary changes in JavaClasses::compute_offset
>> (4) Minor clean up to use QuickSort::sort() instead of qsort()
>> (5) Moved the C-side of module initialization for platform/system
>>     loaders to inside java.lang.System::initPhase2(), so this happens
>>     at the same time as without full module archiving.
>> (6) Moved the use of Module_lock to so all modules in a class loader
>>     are restored atomically. See ArchivedClassLoaderData::restore()
>>     This fixed a bug where test/jdk/com/sun/jdi/ModulesTest.java
>>     would fail as it sees a partially restored set of modules.
>> On 8/7/20 12:06 PM, Lois Foltan wrote:
>>> Hi Ioi,
>>> Overall looks promising.  I have some review comments below, but not 
>>> a complete review.  I concentrated on the JVM side primarily how the 
>>> archived module graph is read in, how the ModuleEntry and 
>>> PackageEntry tables are created from the archive for the 3 builtin 
>>> loaders and potential timing issues during start up.
>>> On 7/22/2020 3:36 PM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8244778
>>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v01/ 
>>>> Please review this patch that stores the full module graph in the CDS
>>>> archive heap. This reduces the initialization time of the basic JVM by
>>>> about 22%:
>>>> $ perf stat -r 100 bin/java -version
>>>> before: 98,219,329 instructions 0.03971 secs elapsed (+- 0.44%)
>>>> after:  55,835,815 instructions 0.03109 secs elapsed (+- 0.65%)
>>>> [1] Start with ModuleBootstrap.java. The current implementation is
>>>>     quite restrictive: the archived module graph is used only when no
>>>>     module options are specified.
>>>>     See ModuleBootstrap.mayUseArchivedBootLayer().
>>>>     We can probably support options such as main module and module 
>>>> path
>>>>     in a future RFE.
>>> Yes, I noticed the restrictions.  The JBS issue discusses the 
>>> possibility of using the dumped module graph in the event that the 
>>> value of the options are the same.  I think for this implementation 
>>> to be viable and have a positive impact you should consider 
>>> comparing at least the options --add-modules, --add-exports and 
>>> --add-reads.
>> I agree. I want to keep the changes minimal in this RFE, and will add 
>> more support for other use cases in follow-on RFEs. Instead of 
>> requiring these options to be unspecified, we can relax the 
>> restriction such that these options must be the same between archive 
>> dump time and run time.
> Sounds good!
>>>> [2] In the current JDK implementation, there is no single object
>>>>     that represents "the module graph". Most of the information
>>>>     is stored in the archive bootLayer object, but a few additional
>>>>     restoration operations need to be performed:
>>>>     + See ModuleBootstrap.getArchivedBootLayer()
>>>>     + Some static fields need to be archived/restored in
>>>>       Module.java, BuiltinClassLoader.java, ClassLoaders.java
>>>>       and BootLoader.java
>>>> [3] I ran into a complication with two loader instances of
>>>>     PlatformClassLoader and AppClassLoader. They are stored in
>>>>     multiple tables inside the module graph (e.g.,
>>>>     BuiltinClassLoader$LoadedModule) so I cannot easily recreate
>>>>     them at runtime.
>>>>     However, these two loaders contain information specific to the
>>>>     dump time VM lifecycle (such as the classes that were loaded
>>>>     during CDS dumping) that need to be scrubbed. I couldn't find an
>>>>     elegant way of doing this, so I added a private 
>>>> "resetArchivedStates"
>>>>     method to a few classes. They are called inside
>>>>     HeapShared::reset_archived_object_states().
>>>> [4] Related native data structures (PackageEntry and ModuleEntry)
>>>>     are also archived. Start with classLoaderData.cpp
>>>> Passes mach5 tiers 1-4. I will test with additional tiers.
>>>> Thanks
>>>> - Ioi
>>> classfile/classLoader.cpp
>>> - line #1644 pulling the javabase_moduleEntry() check outside of the 
>>> Module_lock maybe problematic if after you check it another
>>>   thread initializes in the time between the check and the obtaining 
>>> of the Module_lock on line #1645.
>> Fixed.
> That looks good.  I think it is fine that you are checking if 
> java.base is defined via a call to javabase_moduleEntry() because you 
> are just trying to determine if a ModuleEntry should be created or 
> not.  In most cases though using ModuleEntryTable::javabase_defined() 
> is the correct way to ensure that both the ModuleEntry for java.base 
> has been created and that the ModuleEntry has been injected in the 
> corresponding java.lang.Module oop.
>>> classfile/classLoader.hpp
>>> - somewhere in ArchivedClassLoaderData there should be an assert to 
>>> make sure that the CLD, whose package entries and module entries are 
>>> being archived is not a "_has_class_mirror_holder" CLD for a weakly 
>>> defined hidden class.  Those dedicated CLDs should never have 
>>> package entries or module entries.
>> I added ArchivedClassLoaderData::assert_valid()
>>> classfile/moduleEntry.cpp
>>> - line #400, typo "conver" --> "convert"
>>> - line #500, maybe sort if n is greater than 1 instead of 0 (same 
>>> comment for packageEntry.cpp line #270)
>> Fixed
>>> classfile/systemDictionary.cpp
>>> - It looks like the PackageEntry and ModuleEntry tables for the 
>>> system and platform class loaders are  added within 
>>> SystemDictionary::compute_java_loaders() which is called from 
>>> Thread::create_vm() but after the call in Thread::create_vm() to 
>>> call_initPhase2 which is when Universe::_module_initialized is set 
>>> to true.  Did I read this correctly?  If yes, then I think you have 
>>> to be sure that Universe::_module_initialized is not set until the 
>>> module graph for the 3 builtin loaders is loaded from the archive.
>> I moved the code to be called by ModuleBootstrap::boot() so it should 
>> now happen inside call_initPhase2.
> Yes, that looks good.
>>> memory/filemap.cpp
>>> - line #237 typo "modue" --> "module"
>> Fixed
>>> - Since the CDS support for capturing the module graph does not 
>>> archive unnamed modules, I assume 
>>> Modules::set_bootloader_unnamed_module() is still called.  Is this 
>>> true or does the unnamed module for the boot loader get archived?
>> The unnamed module for the boot loader is not archived.
>> Modules::set_bootloader_unnamed_module()  wasn't called in my last 
>> webrev. Thanks for catching it.
>> I added a call to BootLoader.getUnnamedModule() in 
>> ModuleBootstrap::boot()  to trigger <clinit> of BootLoader, which 
>> will call into the VM for Modules::set_bootloader_unnamed_module().
> Looks good.
>>> Clarification/timing questions:
>> Here's an overall problem I am facing:
>> The module graph is archived after the module system has fully 
>> started up. This means that for the boot loader, we only know the 
>> full set of modules/packages, but we don't know which part is the 
>> subset that was initialized during early VM bootstraping (e.g., when 
>> ModuleEntryTable::javabase_defined() == false).
>> So the behavior is a bit different during the early bootstrap phase 
>> (all the way before we reach ModuleBootstrap::boot()): 
>> ClassLoaderData::the_null_class_loader_data()->modules() and 
>> ClassLoaderData::the_null_class_loader_data()->packages() are already 
>> populated before a single class is loaded.
> If this is the case then, at the point when a ModuleEntry is created 
> for java.base using the archived module graph, there should be a 
> assertion that java_lang_Class::_fixup_module_field_list is NULL. To 
> confirm no class has been loaded before java.base is defined. Maybe in 
> ClassLoaderDataShared::serialize where you restore the boot loader's 
> archived modules?

Thanks for pointing this out. It turned out that in the last webrev 
(v02), I had a bug where the module of the primitive classes were not 
initialized. Now I have changed the initialization code to behave the 
same whether archived full module graph is used or not. The only 
differences are:

[1] ClassLoader::create_javabase(): 
ModuleEntryTable::javabase_moduleEntry() may be inited by CDS.

[2] When archived full module graph is used, 
ModuleEntryTable::patch_javabase_entries is called from 
(java_lang_Class::_fixup_module_field_list is used within this call.)

I also added a new test case: cds/PrimitiveClassMirrors.java

>> This difference doesn't seem to make a practical difference. E.g., 
>> none of our code seems to assume that "before any classes in the 
>> java.util package is loaded, 
>> ClassLoaderData::the_null_class_loader_data()->packages() must not 
>> contain an entry for java.util".
>> I think we have two choices when the archived module graph is used:
>> [1] We require that the state of the module system must, at every 
>> step of VM initialization, be identical to that of a VM that doesn't 
>> use an archived module graph.
>> [2] We make sure that the VM/JDK bootstrap code can tolerate the 
>> existence of module/packages that are added earlier than a VM that 
>> doesn't use an archived module graph.
>> I tried doing a version of [1] and found that to be too difficult. 
>> [2] seems much simpler and is the approach I am using now.
> I think [2] is reasonable.
>>> oops/instanceKlass.cpp
>>> - line #2545, comment "TO DO  -- point it to the archived 
>>> PackageEntry (JDK-8249262)"
>>> are you thinking that since the module graph is read in ahead of 
>>> time that it can be set when an InstanceKlass is created? There is a 
>>> point before java.base is defined that InstanceKlass::set_package 
>>> takes into account that could be a timing issue.
>> I think it will work as if another class in the same package has 
>> already been defined.
>>> - There are some checks in modules.cpp that are valuable to have 
>>> during bootstrapping.  For example, packages that are encountered 
>>> during bootstrapping before java.base is defined are put into the 
>>> java.base module but then verified after java.base is defined via 
>>> verify_javabase_packages.  So at what point in the bootstrapping 
>>> process does the boot loader's archived module's become known? Could 
>>> classes have been defined to the VM before this point? Then their 
>>> packages will have to be verified to be sure that they are indeed 
>>> packages within java.base.  Consider looking at other checks that 
>>> occur within classfile/modules.cpp as well.
>> As mentioned above, calling verify_javabase_packages() at run time 
>> will fail, as we have loaded all packages for the boot loader, not 
>> just those for java.base.
> I think not calling verify_javabase_packages works because as you 
> stated above no classes have been loaded before java.base is defined 
> which is not the same situation as running without the archived module 
> graph.
> A couple of additional comments:
> - ModuleEntry's reads list and PackageEntry's exports list.  We had 
> hoped eventually to change these lists from being a c heap allocated 
> GrowableArray over to a ResourceHashTable for faster lookup.  Doing 
> that as a separate RFE might help CDS archiving not to have to archive 
> those lists as an Array?

CDS cannot archive ResourceHashTable either. We have CompactHashtable 
which can be archived, but it cannot be modified at run time. I think 
the export list can be modified at run time with 
java.lang.Module::addExports(), so we probably need to do it like:

- if a Module was archived, check its CompactHashtable first
- if not found, check the ResourceHashTable

This would make the start-up a little faster (no more copying from the 
array lists into the hashtable, but would make the code more 
complicated. I need to investigate to see if it's worthwhile.

> - moduleEntry.cpp and packageEntry.cpp: Both methods 
> "compare_module_by_name" and "compare_package_by_name" should have an 
> assert if the names are equal.  No ModuleEntryTable or 
> PackageEntryTable should ever have 2 same named modules or packages.

I tried adding:

static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
   assert(a != b && a->name() != b->name(), "array should not contain 
duplicated entries");
   return a->name()->fast_compare(b->name());

but this caused an assert, because our QuickSort implementation would 
sometimes compare the same element!

#7  0x00007ffff659238b in QuickSort::partition<true, PackageEntry*, int 
(*)(void const*, void const*)> (array=0x80043fcb0,
     pivot=248, length=496, comparator=0x7ffff6590a5a 
<compare_package_by_name(PackageEntry*, PackageEntry*)>)
     at /jdk2/gil/open/src/hotspot/share/utilities/quickSort.hpp:76
76          for ( ; comparator(array[left_index], pivot_val) < 0; 
++left_index) {
(gdb) p array[left_index]
$1 = (PackageEntry *) 0x7ffff0439d00
(gdb) p pivot_val
$2 = (PackageEntry *) 0x7ffff0439d00
(gdb) p pivot
$3 = 248
(gdb) p left_index
$4 = 248

So I ended up with:

static int compare_package_by_name(PackageEntry* a, PackageEntry* b) {
   assert(a == b || a->name() != b->name(), "no duplicated names");
   return a->name()->fast_compare(b->name());

> - Another timing clarification question for me.  I assume that the 
> module graph is dumped post module initialization when Java has 
> completely defined the module graph to the JVM, is this correct?


> My concern is that there could be a window post module initialization 
> and pre archiving the module graph where another thread could define a 
> new module or add Module readability or add Package exportability.  So 
> that the module graph you are dumping is not the same module graph at 
> the end of module initialization.  Is this a concern?

We don't allow arbitrary code to be executed during -Xshare:dump, so the 
module graph shouldn't be changed after module initialization has finished.

I've added Modules::check_cds_restrictions() to check for this.

I also added asserts to make sure that none of the classes used by the 
archived module graph can be modified by JVMTI. All these classes are 
loaded in the "early" phase of JVMTI, and we would disable CDS if a 
JVMTI agent is registered to modify classes in this phase, so we should 
be safe, but the asserts will ensure that. I updated the 
ReplaceCriticalClassesForSubgraphs.java test and added a new test 

- Ioi

> Thanks,
> Lois
>> Well, verify_javabase_packages() was called once and it succeeded, 
>> but that was during CDS dump time. So we know at least we have 
>> verified this once :-)
>> Thanks
>> - Ioi
>>> I may have more review comments as I continue to look at this!
>>> Thanks,
>>> Lois

More information about the core-libs-dev mailing list