RFR (L) 7133093: Improve system dictionary performance

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Tue Jul 18 14:56:35 UTC 2017

Karen,  Thank you for reviewing the code and for the discussions and 
consulting along the way.

On 7/17/17 4:10 PM, Karen Kinnear wrote:
> Feedback on the initial web rev:
> Thank you for using web rev - that way I can do searches of the 
> overall patch!
> First - many thanks for doing this exercise so carefully, for 
> following the specification in continuing
> to record the initiating loader and for splitting this effort into 
> multiple phases to allow further exploration
> of the current expectations/behavior relative to class unloading.
> And thank you for all the testing.
> Just wanted to ensure that I understand some of the assumptions:
> 1) lock free reads of dictionaries - yes
> 2) lock free adds of entries to dictionaries - yes
No, adds still take out the SystemDictionary_lock.
> 3) removal of entries or entire dictionaries - at safe point only
> 4) rehashing - none today and none planned
> 5) resizing of dictionaries - none today, some planned
> 6) use of SystemDictionary_lock: essentially unchanged
> 7) atomicity guarantees relative to checking loader constraints and 
> updates to the dictionary: unchanged
> I did not review the SA/vmStructs - sounds like you had Jini George do 
> that - great.
> Minor questions/comments:
> SystemDictionary::verify: "Verify of system dictionary failed" ->
> looks like "Verify of placeholders failed"

Fixed.  Good find.  It's an odd assert:

   guarantee(placeholders()->number_of_entries() >= 0,
             "Verify of placeholders failed");

> thank you for asserts dictionary != NULL
> Dictionary::print
>   is it worth an assert that loader_data != NULL?

Added.  It's already asserted in Dictionary::verify()

> Can you please sanity check if there are other cases in which we need
> to use dictionary_or_null() rather than dictionary()?
>  - are there cases in which we could be using a dictionary from an 
> anonymous class?
>   e.g. there are a lot of places in systemDictionary.cpp where we
>   call class_loader_data(class_loader); loader_data->dictionary();
>      - was the lazily allocated dictionary allocated yet (see next 
> question)

There's an assert in dictionary() that the CLD isn't anonymous.  In 
these calls we get class_loader_data from the class_loader which is 
always the host class loader for anonymous classes, so this doesn't 
break with the current design.

> CLD.cpp
>   Would it be less complex to eagerly create the dictionary for a CLD
>   since it is extremely rare to not need one for non-anonymous CLDs?
>   You could create before publishing the CLD

Yes, I agree it would be better this way.  I was looking at some data 
and it does seem rare to have a CLD with no classes.
> Terminology:
>   How are you still using System Dictionary term?
>   It still shows up in some comments and in some printed strings

Bad habit, we've been calling it that for years. :)  I'll update the 
comments where it makes sense.
I tried to refer to the ClassLoaderDataGraph dictionaries collectively 
as the "System Dictionary", as well as the loader constraints and 
placeholder tables, and the other things that SystemDictionary contains.

I guess it gets confusing, but it's less awkward than "class loader data 
dictionaries", and more meaningful than simply "dictionaries". So that's 
why I didn't change a lot of the comments.

> biasedLocking.cpp, line 59
> jvmtiGetLoadedClasses: line 298
> dictionary.hpp lines 113,114, 120
> dictionary.cpp 262-263, 478
> dictionary.cpp 491
>   fix comments e.g. "Iterate the system dictionary" ...
>   - probably worth checking for other comments in the sources too
>   unless these all appear to match new usage model

Fixed.  I've found a couple more and some other comments that are incorrect.
> hashTable.hpp
> line 268: what does this comment mean?
>   "Don't create and use freelist"?
>   I assume this means that this is a direct malloc, not using any block
>   allocation, e.g. metaspace chunks, and this requires explicit freeing?

   // Don't create and use freelist of HashtableEntry.

The default Hashtable<> allocates entries from a block of entries and 
uses a freelist for freed entries.

> dictionary.hpp line 161
>   could you possibly clean up comment while you are in here?
>   I presume "a protection" is "a protection domain"?


> dictionary.cpp
>   why change from Hashtable::free_entry to explicit
> unlink_entry/FREE_C_HEAP_ARRAY?

I used the same code as the ModuleTable.  The default hashtable doesn't 
delete the whole thing and left some memory around, if I remember 
correctly.  I'll recheck it.
> Future optimization:
> I think that none of the built-in loaders is ever recorded as an 
> initiating
> loader for any class in a loader which permits unloading.
> Note: we need to ensure that built-in includes boot, platform and
> application class loader (and clean up any confusion with system
> class loader - which can be set by the user to a custom class loader)
I think AppClassLoader can initiate loading and is a builtin loader.

Dictionary for class loader class loader 0x00007f6abc6e3210 a 
Java system dictionary (table_size=107, classes=1)
^ indicates that initiating loader is different from defining loader
  106: ^java.util.ArrayList, loader NULL class_loader

> Any chance of changing DictionaryEntry* probe->klass() to be
> probe->instanceKlass()?

Okay.  Changed to instance_klass() as per coding standard.
> Thank you for assertion _is_at_safepoint for try_get_next_class
> CLD.cpp
>   If the names of the classes_do vs. all_entries_do were modified to be
>   defining_loader_classes_do vs. all_entries_classes_do or some way
>   to make the distinction in the name that would be helpful.
>   Otherwise, if you could please add comments to the 
> CLDG::dictionary_classes_do
>   etc. to distinguish these.

I added comments to distinguish these.
> CLD.cpp
>   lines 1200-1201
>   Can you please fix the comment?
>   I think you need separate comments for the ModuleEntry 
> reads/PackageEntry's
> exports which are looking for dead modules.
>   IIUC, the do_unloading is actually looking for classes for which this
>   is the initating loader, but their defining loader is dead.
       // Remove entries in the dictionary of live class loader that have
       // initiated loading classes in a dead class loader.

Moved the module comment to before the module code and added this for 
dictionary->do_unloading() call.
> systemDictionary.cpp: 445-447
>   THANK you for the comments about the java_mirror handle keeping the 
> class
>   and class loader alive
>   Could you clarify that the class_loader handle is not for the class
>   we are checking, but rather for the initiating, i.e. requesting 
> class loader
>   (is that accurate?) So we are keeping both the requestor and the
>   requested class alive

Yes, that is true.   I added

   // The class_loader handle passed in is the initiating loader.

> You removed the comment that compute_hash could lead to a safepoint/GC
>   Is that not possible?
>   comment in ObjectSynchronizer::FastHashCode says that with 
> UseBiasedLocking
>     we can revoke biases and must be able to handle a safepoint.

I made it not possible with this change.   The hashcode is now computed 
by the Symbol* of the Klass name, which is random (os::random plus some 
bits from the address and first two characters, which we did a study of 
for bug https://bugs.openjdk.java.net/browse/JDK-8181450).

Also, I changed ClassLoaderData::identity_hash() to use the address >> 3 
because the addresses of class loader datas are random enough for the 
one use of this function.   Previously, the loader_data identity hash 
used to go to the class_loader oop and get to 
ObjectSynchronizer::FastHashCode (which was also os::random but now 

The dictionary hashing doesn't need to add in the hash for the 
loader_data now.
> systemDictionary.cpp:
>    NoSafepointVerifier
>      If we add resizing logic later, and that happens at a safepoint,
>      then the number of buckets change and the d_index is no longer valid
>      so we would need to add back the NSpV?
>     - this is a more general problem if compute_hash can lead to a 
> safepoint/GC

Yes this is true, except that compute_hash doesn't safepoint. Computing 
the hash and index should be done together with the dictionary 
operations.  The function resolve_instance_class_or_null computes the 
hash and index once at the beginning and reuses it for several lookups, 
which may or may not affect performance.  So we need to measure this in 
isolation before changing this.
> systemDictionary.cpp: parse_stream
>    why did you remove the record_dependency?

I realized it's trivially unnecessary.   The class_loader is reached by 
the ClassLoaderData::_class_loader field, so we don't have to record a 
dependency from host_klass->class_loader in the anonymous class loader 
data.  (I just checked this again because it's confusing).

> Who calls Dictionary::~Dictionary?

The ClassLoaderData::~ClassLoaderData() calls delete _dictionary, which 
calls the destructor.
> Performance:
>   Do you have performance runs of our existing benchmarks?

Yes, I ran our internal benchmarks and added links to the internal wiki 
page.   I'll rerun them again before checkin.

>   Initial key requirement is that this not slow down throughput through
>   slowing down lookups.
>   I know you are very aware of longer-term goals for reducing 
> safepoint pause
>   times and not increasing footprint significantly.
> hashtable sizes: future rfe?
>   given that we now have a lot more dictionaries, and they perhaps need
>   individual sizing
>    - would it be helpful to add statistics information about
>    - e.g. lookup_count, lookup_length, _lookup_warning
>    - i.e. number of entries, max lookup depth, etc. to use later for 
> tuning?

All the previous statistics didn't make very much sense to us, so they 
were removed.  There is now logging for all the hashtables:

busaa027.us.oracle.com% java -Xlog:hashtables -version
java version "10-internal"
Java(TM) SE Runtime Environment (fastdebug build 
Java HotSpot(TM) 64-Bit Server VM (fastdebug build 
10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2, mixed mode)
[0.284s][info][hashtables] System Dictionary for 
jdk/internal/loader/ClassLoaders$AppClassLoader max bucket size 1 bucket 
91 element count 1 table size 107
[0.285s][info][hashtables] System Dictionary for <bootloader> max bucket 
size 4 bucket 263 element count 468 table size 1009
[0.285s][info][hashtables] Placeholder Table max bucket size 0 bucket 0 
element count 0 table size 1009
[0.285s][info][hashtables] Protection Domain Table max bucket size 0 
bucket 0 element count 0 table size 1009

>    - might make sense to make this JFR data?

I'll file an RFE to add tracing data with our tools.  I'll need some 
help with this part but the data is there.
>    Might think about a follow-up with more tuning parameters than
>    a single global PredictedLoadedClassCount.

I filed RFE to work on resizing, as this work was almost complete, but 
didn't show performance improvements at the time.  It would be nice to 
have customers not need tuning parameters, but I guess these things are 
sometimes necessary.

Thank you for the thorough review and all of your help with this! I've 
made these changes and will rerun testing and send out a new webrev and 
an incremental webrev.


> thanks,
> Karen
>> On Jul 17, 2017, at 2:37 PM, coleen.phillimore at oracle.com 
>> <mailto:coleen.phillimore at oracle.com> wrote:
>> Hi, I have made changes for Ioi's comments below.  The largest change 
>> since .01 is that PredictedLoadedClassCount is used to size 
>> dictionaries for class loader data, and should be used to specify the 
>> average number of classes loaded by each class loader.  Since this is 
>> an option available via UnlockExperimentalVMOptions, I'm not sure 
>> what the process is for changing the meaning of this option. We will 
>> work with our customers who have used this option for specifying the 
>> total number of classes loaded.   We are also working on the change 
>> to allow automatic resizing of the dictionaries during safepoints.
>> See:
>> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.02/webrev 
>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.02/webrev>
>> Tested with the full nightly test set (jdk, nsk, hotspot jtreg tests) 
>> on linux x64.
>> Thanks,
>> Coleen
>> On 6/30/17 4:45 PM, coleen.phillimore at oracle.com 
>> <mailto:coleen.phillimore at oracle.com> wrote:
>>> Ioi,  Thank you for looking at this change.
>>> On 6/30/17 2:39 PM, Ioi Lam wrote:
>>>> Hi Coleen,
>>>> Maybe the bug should be renamed to "Use one Dictionary per class 
>>>> loader instance"? That way it's more obvious what it is when you 
>>>> look at the repo history.
>>> I can do that.  I made it One Dictionary per ClassLoaderData
>>>> 1.
>>>> Why is assert_locked_or_safepoint(SystemDictionary_lock) necessary 
>>>> in SystemDictionary::find_class (line 1826), but not necessary 
>>>> SystemDictionary::find (line 951)? Since you removed 
>>>> NoSafepointVerifier nosafepoint in the latter, maybe this means 
>>>> it's safe to remove the assert_locked_or_safepoint in the former?
>>> The call to SystemDictionary::find() is the (I believe) usual lock 
>>> free lookup and the SystemDictionary::find_class() is used to verify 
>>> that a class we're about to add or want to add hasn't been added by 
>>> another thread.   Or certain cases where we already have a lock to 
>>> do something else, like add loader constraints.  I took out the 
>>> NoSafepointVerifier because it assumes that system dictionary 
>>> entries would be moved by GC, which they won't.  The old hash 
>>> function could safepoint when getting the hash for the class_loader 
>>> object.
>>>> 2.
>>>> 455 static ClassLoaderData* _current_loader_data = NULL;
>>>> 456 static Klass*           _current_class_entry = NULL;
>>>> 457
>>>> 458 InstanceKlass* ClassLoaderDataGraph::try_get_next_class() {
>>>> How about moving the static fields into an iterator object. That 
>>>> way you don't need to keep track of the globals
>>>> ClassLoaderDataGraphIterator {
>>>>   ClassLoaderData* _current_loader_data
>>>>   Klass*           _current_class_entry;
>>>>   InstanceKlass* try_get_next_class() { ....}
>>>> };
>>> Ok, there's a different iterator that iterates over all of the 
>>> classes for GC.  I will adapt that for this use.   That would be better.
>>>> 3. Double check locking in ClassLoaderData::dictionary() -- someone 
>>>> else should look at this :-)
>>> I copied code that David Holmes added for modules.
>>>> 4. We may need a better strategy for deciding the size of each 
>>>> dictionary.
>>>> 565 const int  _primelist[10] = {1, 107, 1009};
>>>> 571 Dictionary* ClassLoaderData::dictionary() {
>>>> 579     if ((dictionary = _dictionary) == NULL) {
>>>> 580       int size;
>>>> 581       if (this == the_null_class_loader_data() || 
>>>> is_system_class_loader_data()) {
>>>> 582         size = _primelist[2];
>>>> 583       } else if 
>>>> (class_loader()->is_a(SystemDictionary::reflect_DelegatingClassLoader_klass())) 
>>>> {
>>>> 584         size = _primelist[0];
>>>> 585       } else {
>>>> 586         size = _primelist[1];
>>>> 587       }
>>>> 588       dictionary = new Dictionary(this, size);
>>>> I'll do some investigation on this issue and get back to you.
>>> How about if someone uses PredictedLoadedClassCount, then we use 
>>> that to size all but the reflection and boot class loader?  Then if 
>>> there's an application that has a class loader with a huge amount of 
>>> classes loaded in it, that would help this?  It might cost some 
>>> footprint but an oversized table would simply be a bigger array of 
>>> pointers, so it might not be that bad to oversize.
>>> I think the long term solution is being able to resize these entries 
>>> and make the application provide arguments.   Please let me know 
>>> what you find in your investigation and if that would work.
>>>> The rest of the changes look good to me.
>>> Thank you!
>>> Coleen
>>>> Thanks
>>>> - Ioi
>>>> On 6/23/17 4:42 PM, coleen.phillimore at oracle.com 
>>>> <mailto:coleen.phillimore at oracle.com> wrote:
>>>>> Summary: Implement one dictionary per ClassLoaderData for faster 
>>>>> lookup and removal during class unloading
>>>>> See RFE for more details.
>>>>> open webrev at 
>>>>> http://cr.openjdk.java.net/~coleenp/7133093.01/webrev 
>>>>> <http://cr.openjdk.java.net/%7Ecoleenp/7133093.01/webrev>
>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-7133093
>>>>> Tested with full "nightly" run in rbt, plus locally class loading 
>>>>> and unloading tests:
>>>>> jtreg hotspot/test/runtime/ClassUnload
>>>>> jtreg hotspot/test/runtime/modules
>>>>> jtreg hotspot/test/gc/class_unloading
>>>>> make test-hotspot-closed-tonga FILTER=quick TEST_JOBS=4 
>>>>> TEST=vm.parallel_class_loading
>>>>> csh ~/testing/run_jck9  (vm/lang/java_lang)
>>>>> runThese -jck - uses class loader isolation to run each jck test 
>>>>> and unloads tests when done (at -gc:5 intervals)
>>>>> Thanks,
>>>>> Coleen

More information about the hotspot-dev mailing list