RFR(L) 8244778 Archive full module graph in CDS

Lois Foltan lois.foltan at oracle.com
Tue Aug 25 14:58:50 UTC 2020

Hi Ioi,

Changes looks really good.  Comments interspersed below.


On 8/12/2020 6:06 PM, Ioi Lam wrote:
> Hi Lois,
> Thanks for the comments. I have an updated webrev
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02/ 
> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v02.delta/ 
> Here are the general notes on the changes. Replies to your questions 
> are in-line below.
> (1) Integrated updates in the Java code from Alan Bateman. Here are 
> Alan's
>     notes:
>     The archive of the boot layer is as before, except that archiving is
>     skipped if there are split packages or incubator modules. Incubating
>     modules aren't resolved by default so we shouldn't have them in the
>     boot layer by default anyway.
>     I've dropped the module finders from the boot layer archive. I've
>     left the IllegalAccessLogger.Builder in the acrhive for now (even
>     though it is not the boot layer). We should be able to remove that
>     once the JEP to disallow illegal access by default is in.
>     Related is that I don't like the archive for the module graph
>     (ArchivedModuleGraph, pre-dates this RFE) including the set of
>     packages to export/open for illegal access as they aren't part
>     of the module graph. I've left it for now but we can also remove
>     that once we disallow illegal access by default (as those sets
>     will be empty).
>     The archive of built-in class loaders is now in one object
>     jdk.internal.loader.ArchivedClassLoaders which I think will be a
>     bit more maintainable. I've also drop the ucp field from the
>     AppClassLoader as the changes to BuiltinClassLoader means is no
>     longer needs to duplicated.
>     There's one remaining issue in ModuleBootstrap.boot where it has fix
>     an app class loader value (ModuleLayer.CLV). Ideally the 
> initialization
>     of the built-in class loaders would do this but we are kinda forced
>     to separate the archiving of the built-in class loaders from the boot
>     layer. I might look at this again some time.
> (2) Moved code from classLoaderData.cpp -> classLoaderSharedData.cpp
> (3) Reverted unnecessary changes in JavaClasses::compute_offset
> (4) Minor clean up to use QuickSort::sort() instead of qsort()
> (5) Moved the C-side of module initialization for platform/system
>     loaders to inside java.lang.System::initPhase2(), so this happens
>     at the same time as without full module archiving.
> (6) Moved the use of Module_lock to so all modules in a class loader
>     are restored atomically. See ArchivedClassLoaderData::restore()
>     This fixed a bug where test/jdk/com/sun/jdi/ModulesTest.java
>     would fail as it sees a partially restored set of modules.
> On 8/7/20 12:06 PM, Lois Foltan wrote:
>> Hi Ioi,
>> Overall looks promising.  I have some review comments below, but not 
>> a complete review.  I concentrated on the JVM side primarily how the 
>> archived module graph is read in, how the ModuleEntry and 
>> PackageEntry tables are created from the archive for the 3 builtin 
>> loaders and potential timing issues during start up.
>> On 7/22/2020 3:36 PM, Ioi Lam wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8244778
>>> http://cr.openjdk.java.net/~iklam/jdk16/8244778-archive-full-module-graph.v01/ 
>>> Please review this patch that stores the full module graph in the CDS
>>> archive heap. This reduces the initialization time of the basic JVM by
>>> about 22%:
>>> $ perf stat -r 100 bin/java -version
>>> before: 98,219,329 instructions 0.03971 secs elapsed (+- 0.44%)
>>> after:  55,835,815 instructions 0.03109 secs elapsed (+- 0.65%)
>>> [1] Start with ModuleBootstrap.java. The current implementation is
>>>     quite restrictive: the archived module graph is used only when no
>>>     module options are specified.
>>>     See ModuleBootstrap.mayUseArchivedBootLayer().
>>>     We can probably support options such as main module and module path
>>>     in a future RFE.
>> Yes, I noticed the restrictions.  The JBS issue discusses the 
>> possibility of using the dumped module graph in the event that the 
>> value of the options are the same.  I think for this implementation 
>> to be viable and have a positive impact you should consider comparing 
>> at least the options --add-modules, --add-exports and --add-reads.
> I agree. I want to keep the changes minimal in this RFE, and will add 
> more support for other use cases in follow-on RFEs. Instead of 
> requiring these options to be unspecified, we can relax the 
> restriction such that these options must be the same between archive 
> dump time and run time.

Sounds good!

>>> [2] In the current JDK implementation, there is no single object
>>>     that represents "the module graph". Most of the information
>>>     is stored in the archive bootLayer object, but a few additional
>>>     restoration operations need to be performed:
>>>     + See ModuleBootstrap.getArchivedBootLayer()
>>>     + Some static fields need to be archived/restored in
>>>       Module.java, BuiltinClassLoader.java, ClassLoaders.java
>>>       and BootLoader.java
>>> [3] I ran into a complication with two loader instances of
>>>     PlatformClassLoader and AppClassLoader. They are stored in
>>>     multiple tables inside the module graph (e.g.,
>>>     BuiltinClassLoader$LoadedModule) so I cannot easily recreate
>>>     them at runtime.
>>>     However, these two loaders contain information specific to the
>>>     dump time VM lifecycle (such as the classes that were loaded
>>>     during CDS dumping) that need to be scrubbed. I couldn't find an
>>>     elegant way of doing this, so I added a private 
>>> "resetArchivedStates"
>>>     method to a few classes. They are called inside
>>>     HeapShared::reset_archived_object_states().
>>> [4] Related native data structures (PackageEntry and ModuleEntry)
>>>     are also archived. Start with classLoaderData.cpp
>>> Passes mach5 tiers 1-4. I will test with additional tiers.
>>> Thanks
>>> - Ioi
>> classfile/classLoader.cpp
>> - line #1644 pulling the javabase_moduleEntry() check outside of the 
>> Module_lock maybe problematic if after you check it another
>>   thread initializes in the time between the check and the obtaining 
>> of the Module_lock on line #1645.
> Fixed.

That looks good.  I think it is fine that you are checking if java.base 
is defined via a call to javabase_moduleEntry() because you are just 
trying to determine if a ModuleEntry should be created or not.  In most 
cases though using ModuleEntryTable::javabase_defined() is the correct 
way to ensure that both the ModuleEntry for java.base has been created 
and that the ModuleEntry has been injected in the corresponding 
java.lang.Module oop.

>> classfile/classLoader.hpp
>> - somewhere in ArchivedClassLoaderData there should be an assert to 
>> make sure that the CLD, whose package entries and module entries are 
>> being archived is not a "_has_class_mirror_holder" CLD for a weakly 
>> defined hidden class.  Those dedicated CLDs should never have package 
>> entries or module entries.
> I added ArchivedClassLoaderData::assert_valid()
>> classfile/moduleEntry.cpp
>> - line #400, typo "conver" --> "convert"
>> - line #500, maybe sort if n is greater than 1 instead of 0 (same 
>> comment for packageEntry.cpp line #270)
> Fixed
>> classfile/systemDictionary.cpp
>> - It looks like the PackageEntry and ModuleEntry tables for the 
>> system and platform class loaders are  added within 
>> SystemDictionary::compute_java_loaders() which is called from 
>> Thread::create_vm() but after the call in Thread::create_vm() to 
>> call_initPhase2 which is when Universe::_module_initialized is set to 
>> true.  Did I read this correctly?  If yes, then I think you have to 
>> be sure that Universe::_module_initialized is not set until the 
>> module graph for the 3 builtin loaders is loaded from the archive.
> I moved the code to be called by ModuleBootstrap::boot() so it should 
> now happen inside call_initPhase2.

Yes, that looks good.

>> memory/filemap.cpp
>> - line #237 typo "modue" --> "module"
> Fixed
>> - Since the CDS support for capturing the module graph does not 
>> archive unnamed modules, I assume 
>> Modules::set_bootloader_unnamed_module() is still called.  Is this 
>> true or does the unnamed module for the boot loader get archived?
> The unnamed module for the boot loader is not archived.
> Modules::set_bootloader_unnamed_module()  wasn't called in my last 
> webrev. Thanks for catching it.
> I added a call to BootLoader.getUnnamedModule() in 
> ModuleBootstrap::boot()  to trigger <clinit> of BootLoader, which will 
> call into the VM for Modules::set_bootloader_unnamed_module().

Looks good.

>> Clarification/timing questions:
> Here's an overall problem I am facing:
> The module graph is archived after the module system has fully started 
> up. This means that for the boot loader, we only know the full set of 
> modules/packages, but we don't know which part is the subset that was 
> initialized during early VM bootstraping (e.g., when 
> ModuleEntryTable::javabase_defined() == false).
> So the behavior is a bit different during the early bootstrap phase 
> (all the way before we reach ModuleBootstrap::boot()): 
> ClassLoaderData::the_null_class_loader_data()->modules() and 
> ClassLoaderData::the_null_class_loader_data()->packages() are already 
> populated before a single class is loaded.

If this is the case then, at the point when a ModuleEntry is created for 
java.base using the archived module graph, there should be a assertion 
that java_lang_Class::_fixup_module_field_list is NULL. To confirm no 
class has been loaded before java.base is defined. Maybe in 
ClassLoaderDataShared::serialize where you restore the boot loader's 
archived modules?

> This difference doesn't seem to make a practical difference. E.g., 
> none of our code seems to assume that "before any classes in the 
> java.util package is loaded, 
> ClassLoaderData::the_null_class_loader_data()->packages() must not 
> contain an entry for java.util".
> I think we have two choices when the archived module graph is used:
> [1] We require that the state of the module system must, at every step 
> of VM initialization, be identical to that of a VM that doesn't use an 
> archived module graph.
> [2] We make sure that the VM/JDK bootstrap code can tolerate the 
> existence of module/packages that are added earlier than a VM that 
> doesn't use an archived module graph.
> I tried doing a version of [1] and found that to be too difficult. [2] 
> seems much simpler and is the approach I am using now.

I think [2] is reasonable.

>> oops/instanceKlass.cpp
>> - line #2545, comment "TO DO  -- point it to the archived 
>> PackageEntry (JDK-8249262)"
>> are you thinking that since the module graph is read in ahead of time 
>> that it can be set when an InstanceKlass is created?  There is a 
>> point before java.base is defined that InstanceKlass::set_package 
>> takes into account that could be a timing issue.
> I think it will work as if another class in the same package has 
> already been defined.
>> - There are some checks in modules.cpp that are valuable to have 
>> during bootstrapping.  For example, packages that are encountered 
>> during bootstrapping before java.base is defined are put into the 
>> java.base module but then verified after java.base is defined via 
>> verify_javabase_packages.  So at what point in the bootstrapping 
>> process does the boot loader's archived module's become known? Could 
>> classes have been defined to the VM before this point? Then their 
>> packages will have to be verified to be sure that they are indeed 
>> packages within java.base.  Consider looking at other checks that 
>> occur within classfile/modules.cpp as well.
> As mentioned above, calling verify_javabase_packages() at run time 
> will fail, as we have loaded all packages for the boot loader, not 
> just those for java.base.

I think not calling verify_javabase_packages works because as you stated 
above no classes have been loaded before java.base is defined which is 
not the same situation as running without the archived module graph.

A couple of additional comments:

- ModuleEntry's reads list and PackageEntry's exports list.  We had 
hoped eventually to change these lists from being a c heap allocated 
GrowableArray over to a ResourceHashTable for faster lookup.  Doing that 
as a separate RFE might help CDS archiving not to have to archive those 
lists as an Array?

- moduleEntry.cpp and packageEntry.cpp: Both methods 
"compare_module_by_name" and "compare_package_by_name" should have an 
assert if the names are equal.  No ModuleEntryTable or PackageEntryTable 
should ever have 2 same named modules or packages.

- Another timing clarification question for me.  I assume that the 
module graph is dumped post module initialization when Java has 
completely defined the module graph to the JVM, is this correct?  My 
concern is that there could be a window post module initialization and 
pre archiving the module graph where another thread could define a new 
module or add Module readability or add Package exportability.  So that 
the module graph you are dumping is not the same module graph at the end 
of module initialization.  Is this a concern?


> Well, verify_javabase_packages() was called once and it succeeded, but 
> that was during CDS dump time. So we know at least we have verified 
> this once :-)
> Thanks
> - Ioi
>> I may have more review comments as I continue to look at this!
>> Thanks,
>> Lois

More information about the core-libs-dev mailing list