RFR (L) 7133093: Improve system dictionary performance

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Jul 24 23:37:59 UTC 2017

Thanks, Ioi.

On 7/21/17 4:37 PM, Ioi Lam wrote:
> Hi Coleen,
> The new changes look good to me. Thanks!
> Ioi
>> On Jul 21, 2017, at 11:28 AM, coleen.phillimore at oracle.com wrote:
>>  From internal discussions, the only sizing of the dictionary for the java.system.class.loader will follow PredictedLoadedClassCount until dictionary resizing is done.  The latest full webrev is here:
>> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.04/webrev
>> The only change from .03 is
>> http://cr.openjdk.java.net/~coleenp/7133093.04/webrev/src/share/vm/classfile/classLoaderData.cpp.frames.html
>> lines 595-630.   Please review!
>> Thanks,
>> Coleen
>>> On 7/19/17 10:52 PM, coleen.phillimore at oracle.com wrote:
>>> Hi, I've made the changes in the replies below and retested with both our internal NSK tests, jtreg tests and jck tests.  Also I've rerun the performance tests, including SPECjbb2015.
>>> An incremental webrev can be found at:
>>> http://cr.openjdk.java.net/~coleenp/7133093.03.incremental/webrev/
>>> Latest version at:
>>> http://cr.openjdk.java.net/~coleenp/7133093.03/webrev/
>>> Thanks,
>>> Coleen
>>>> On 7/18/17 10:56 AM, coleen.phillimore at oracle.com wrote:
>>>> Karen,  Thank you for reviewing the code and for the discussions and consulting along the way.
>>>>> On 7/17/17 4:10 PM, Karen Kinnear wrote:
>>>>> Feedback on the initial web rev:
>>>>> Thank you for using web rev - that way I can do searches of the overall patch!
>>>>> First - many thanks for doing this exercise so carefully, for following the specification in continuing
>>>>> to record the initiating loader and for splitting this effort into multiple phases to allow further exploration
>>>>> of the current expectations/behavior relative to class unloading.
>>>>> And thank you for all the testing.
>>>>> Just wanted to ensure that I understand some of the assumptions:
>>>>> 1) lock free reads of dictionaries - yes
>>>> Yes.
>>>>> 2) lock free adds of entries to dictionaries - yes
>>>> No, adds still take out the SystemDictionary_lock.
>>>>> 3) removal of entries or entire dictionaries - at safe point only
>>>> Yes.
>>>>> 4) rehashing - none today and none planned
>>>> Yes.
>>>>> 5) resizing of dictionaries - none today, some planned
>>>> Yes.
>>>>> 6) use of SystemDictionary_lock: essentially unchanged
>>>> Yes.
>>>>> 7) atomicity guarantees relative to checking loader constraints and updates to the dictionary: unchanged
>>>> Yes.
>>>>> I did not review the SA/vmStructs - sounds like you had Jini George do that - great.
>>>>> Minor questions/comments:
>>>>> SystemDictionary::verify: "Verify of system dictionary failed" ->
>>>>> looks like "Verify of placeholders failed"
>>>> Fixed.  Good find.  It's an odd assert:
>>>>   guarantee(placeholders()->number_of_entries() >= 0,
>>>>             "Verify of placeholders failed");
>>>>> thank you for asserts dictionary != NULL
>>>>> Dictionary::print
>>>>>   is it worth an assert that loader_data != NULL?
>>>> Added.  It's already asserted in Dictionary::verify()
>>>>> Can you please sanity check if there are other cases in which we need
>>>>> to use dictionary_or_null() rather than dictionary()?
>>>>> - are there cases in which we could be using a dictionary from an anonymous class?
>>>>>   e.g. there are a lot of places in systemDictionary.cpp where we
>>>>>   call class_loader_data(class_loader); loader_data->dictionary();
>>>>>      - was the lazily allocated dictionary allocated yet (see next question)
>>>> There's an assert in dictionary() that the CLD isn't anonymous. In these calls we get class_loader_data from the class_loader which is always the host class loader for anonymous classes, so this doesn't break with the current design.
>>>>> CLD.cpp
>>>>>   Would it be less complex to eagerly create the dictionary for a CLD
>>>>>   since it is extremely rare to not need one for non-anonymous CLDs?
>>>>>   You could create before publishing the CLD
>>>> Yes, I agree it would be better this way.  I was looking at some data and it does seem rare to have a CLD with no classes.
>>>>> Terminology:
>>>>>   How are you still using System Dictionary term?
>>>>>   It still shows up in some comments and in some printed strings
>>>> Bad habit, we've been calling it that for years. :)  I'll update the comments where it makes sense.
>>>> I tried to refer to the ClassLoaderDataGraph dictionaries collectively as the "System Dictionary", as well as the loader constraints and placeholder tables, and the other things that SystemDictionary contains.
>>>> I guess it gets confusing, but it's less awkward than "class loader data dictionaries", and more meaningful than simply "dictionaries". So that's why I didn't change a lot of the comments.
>>>>> biasedLocking.cpp, line 59
>>>>> jvmtiGetLoadedClasses: line 298
>>>>> dictionary.hpp lines 113,114, 120
>>>>> dictionary.cpp 262-263, 478
>>>>> dictionary.cpp 491
>>>>>   fix comments e.g. "Iterate the system dictionary" ...
>>>>>   - probably worth checking for other comments in the sources too
>>>>>   unless these all appear to match new usage model
>>>> Fixed.  I've found a couple more and some other comments that are incorrect.
>>>>> hashTable.hpp
>>>>> line 268: what does this comment mean?
>>>>>   "Don't create and use freelist"?
>>>>>   I assume this means that this is a direct malloc, not using any block
>>>>>   allocation, e.g. metaspace chunks, and this requires explicit freeing?
>>>>   // Don't create and use freelist of HashtableEntry.
>>>> The default Hashtable<> allocates entries from a block of entries and uses a freelist for freed entries.
>>>>> dictionary.hpp line 161
>>>>>   could you possibly clean up comment while you are in here?
>>>>>   I presume "a protection" is "a protection domain"?
>>>> Fixed.
>>>>> dictionary.cpp
>>>>>   why change from Hashtable::free_entry to explicit
>>>>> unlink_entry/FREE_C_HEAP_ARRAY?
>>>> I used the same code as the ModuleTable.  The default hashtable doesn't delete the whole thing and left some memory around, if I remember correctly.  I'll recheck it.
>>>>> Future optimization:
>>>>> I think that none of the built-in loaders is ever recorded as an initiating
>>>>> loader for any class in a loader which permits unloading.
>>>>> Note: we need to ensure that built-in includes boot, platform and
>>>>> application class loader (and clean up any confusion with system
>>>>> class loader - which can be set by the user to a custom class loader)
>>>> I think AppClassLoader can initiate loading and is a builtin loader.
>>>> Dictionary for class loader class loader 0x00007f6abc6e3210 a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x0000000101c8df00}
>>>> Java system dictionary (table_size=107, classes=1)
>>>> ^ indicates that initiating loader is different from defining loader
>>>> 106: ^java.util.ArrayList, loader NULL class_loader
>>>>> Any chance of changing DictionaryEntry* probe->klass() to be
>>>>> probe->instanceKlass()?
>>>> Okay.  Changed to instance_klass() as per coding standard.
>>>>> Thank you for assertion _is_at_safepoint for try_get_next_class
>>>>> CLD.cpp
>>>>>   If the names of the classes_do vs. all_entries_do were modified to be
>>>>>   defining_loader_classes_do vs. all_entries_classes_do or some way
>>>>>   to make the distinction in the name that would be helpful.
>>>>>   Otherwise, if you could please add comments to the CLDG::dictionary_classes_do
>>>>>   etc. to distinguish these.
>>>> I added comments to distinguish these.
>>>>> CLD.cpp
>>>>>   lines 1200-1201
>>>>>   Can you please fix the comment?
>>>>>   I think you need separate comments for the ModuleEntry reads/PackageEntry's
>>>>> exports which are looking for dead modules.
>>>>>   IIUC, the do_unloading is actually looking for classes for which this
>>>>>   is the initating loader, but their defining loader is dead.
>>>>       // Remove entries in the dictionary of live class loader that have
>>>>       // initiated loading classes in a dead class loader.
>>>>       data->dictionary()->do_unloading();
>>>> Moved the module comment to before the module code and added this for dictionary->do_unloading() call.
>>>>> systemDictionary.cpp: 445-447
>>>>>   THANK you for the comments about the java_mirror handle keeping the class
>>>>>   and class loader alive
>>>>>   Could you clarify that the class_loader handle is not for the class
>>>>>   we are checking, but rather for the initiating, i.e. requesting class loader
>>>>>   (is that accurate?) So we are keeping both the requestor and the
>>>>>   requested class alive
>>>> Yes, that is true.   I added
>>>>   // The class_loader handle passed in is the initiating loader.
>>>>> You removed the comment that compute_hash could lead to a safepoint/GC
>>>>>   Is that not possible?
>>>>>   comment in ObjectSynchronizer::FastHashCode says that with UseBiasedLocking
>>>>>     we can revoke biases and must be able to handle a safepoint.
>>>> I made it not possible with this change.   The hashcode is now computed by the Symbol* of the Klass name, which is random (os::random plus some bits from the address and first two characters, which we did a study of for bug https://bugs.openjdk.java.net/browse/JDK-8181450).
>>>> Also, I changed ClassLoaderData::identity_hash() to use the address >> 3 because the addresses of class loader datas are random enough for the one use of this function. Previously, the loader_data identity hash used to go to the class_loader oop and get to ObjectSynchronizer::FastHashCode (which was also os::random but now changed).
>>>> The dictionary hashing doesn't need to add in the hash for the loader_data now.
>>>>> systemDictionary.cpp:
>>>>>    NoSafepointVerifier
>>>>>      If we add resizing logic later, and that happens at a safepoint,
>>>>>      then the number of buckets change and the d_index is no longer valid
>>>>>      so we would need to add back the NSpV?
>>>>>     - this is a more general problem if compute_hash can lead to a safepoint/GC
>>>> Yes this is true, except that compute_hash doesn't safepoint. Computing the hash and index should be done together with the dictionary operations.  The function resolve_instance_class_or_null computes the hash and index once at the beginning and reuses it for several lookups, which may or may not affect performance.  So we need to measure this in isolation before changing this.
>>>>> systemDictionary.cpp: parse_stream
>>>>>    why did you remove the record_dependency?
>>>> I realized it's trivially unnecessary.   The class_loader is reached by the ClassLoaderData::_class_loader field, so we don't have to record a dependency from host_klass->class_loader in the anonymous class loader data.  (I just checked this again because it's confusing).
>>>>> Who calls Dictionary::~Dictionary?
>>>> The ClassLoaderData::~ClassLoaderData() calls delete _dictionary, which calls the destructor.
>>>>> Performance:
>>>>>   Do you have performance runs of our existing benchmarks?
>>>> Yes, I ran our internal benchmarks and added links to the internal wiki page.   I'll rerun them again before checkin.
>>>>>   Initial key requirement is that this not slow down throughput through
>>>>>   slowing down lookups.
>>>>>   I know you are very aware of longer-term goals for reducing safepoint pause
>>>>>   times and not increasing footprint significantly.
>>>>> hashtable sizes: future rfe?
>>>>>   given that we now have a lot more dictionaries, and they perhaps need
>>>>>   individual sizing
>>>>>    - would it be helpful to add statistics information about
>>>>>    - e.g. lookup_count, lookup_length, _lookup_warning
>>>>>    - i.e. number of entries, max lookup depth, etc. to use later for tuning?
>>>> All the previous statistics didn't make very much sense to us, so they were removed.  There is now logging for all the hashtables:
>>>> busaa027.us.oracle.com% java -Xlog:hashtables -version
>>>> java version "10-internal"
>>>> Java(TM) SE Runtime Environment (fastdebug build 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2)
>>>> Java HotSpot(TM) 64-Bit Server VM (fastdebug build 10-internal+0-2017-07-17-1456035.cphillim.10system-dictionary2, mixed mode)
>>>> [0.284s][info][hashtables] System Dictionary for jdk/internal/loader/ClassLoaders$AppClassLoader max bucket size 1 bucket 91 element count 1 table size 107
>>>> [0.285s][info][hashtables] System Dictionary for <bootloader> max bucket size 4 bucket 263 element count 468 table size 1009
>>>> [0.285s][info][hashtables] Placeholder Table max bucket size 0 bucket 0 element count 0 table size 1009
>>>> [0.285s][info][hashtables] Protection Domain Table max bucket size 0 bucket 0 element count 0 table size 1009
>>>>>    - might make sense to make this JFR data?
>>>> I'll file an RFE to add tracing data with our tools.  I'll need some help with this part but the data is there.
>>>>>    Might think about a follow-up with more tuning parameters than
>>>>>    a single global PredictedLoadedClassCount.
>>>> I filed RFE to work on resizing, as this work was almost complete, but didn't show performance improvements at the time. It would be nice to have customers not need tuning parameters, but I guess these things are sometimes necessary.
>>>> Thank you for the thorough review and all of your help with this! I've made these changes and will rerun testing and send out a new webrev and an incremental webrev.
>>>> Thanks!
>>>> Coleen
>>>>> thanks,
>>>>> Karen
>>>>>> On Jul 17, 2017, at 2:37 PM, coleen.phillimore at oracle.com <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>> Hi, I have made changes for Ioi's comments below.  The largest change since .01 is that PredictedLoadedClassCount is used to size dictionaries for class loader data, and should be used to specify the average number of classes loaded by each class loader.  Since this is an option available via UnlockExperimentalVMOptions, I'm not sure what the process is for changing the meaning of this option. We will work with our customers who have used this option for specifying the total number of classes loaded.   We are also working on the change to allow automatic resizing of the dictionaries during safepoints.
>>>>>> See:
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.02/webrev <http://cr.openjdk.java.net/%7Ecoleenp/7133093.02/webrev>
>>>>>> Tested with the full nightly test set (jdk, nsk, hotspot jtreg tests) on linux x64.
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>> On 6/30/17 4:45 PM, coleen.phillimore at oracle.com <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>>> Ioi,  Thank you for looking at this change.
>>>>>>>> On 6/30/17 2:39 PM, Ioi Lam wrote:
>>>>>>>> Hi Coleen,
>>>>>>>> Maybe the bug should be renamed to "Use one Dictionary per class loader instance"? That way it's more obvious what it is when you look at the repo history.
>>>>>>> I can do that.  I made it One Dictionary per ClassLoaderData
>>>>>>>> 1.
>>>>>>>> Why is assert_locked_or_safepoint(SystemDictionary_lock) necessary in SystemDictionary::find_class (line 1826), but not necessary SystemDictionary::find (line 951)? Since you removed NoSafepointVerifier nosafepoint in the latter, maybe this means it's safe to remove the assert_locked_or_safepoint in the former?
>>>>>>> The call to SystemDictionary::find() is the (I believe) usual lock free lookup and the SystemDictionary::find_class() is used to verify that a class we're about to add or want to add hasn't been added by another thread.   Or certain cases where we already have a lock to do something else, like add loader constraints.  I took out the NoSafepointVerifier because it assumes that system dictionary entries would be moved by GC, which they won't.  The old hash function could safepoint when getting the hash for the class_loader object.
>>>>>>>> 2.
>>>>>>>> 455 static ClassLoaderData* _current_loader_data = NULL;
>>>>>>>> 456 static Klass*           _current_class_entry = NULL;
>>>>>>>> 457
>>>>>>>> 458 InstanceKlass* ClassLoaderDataGraph::try_get_next_class() {
>>>>>>>> How about moving the static fields into an iterator object. That way you don't need to keep track of the globals
>>>>>>>> ClassLoaderDataGraphIterator {
>>>>>>>>   ClassLoaderData* _current_loader_data
>>>>>>>>   Klass*           _current_class_entry;
>>>>>>>>   InstanceKlass* try_get_next_class() { ....}
>>>>>>>> };
>>>>>>> Ok, there's a different iterator that iterates over all of the classes for GC.  I will adapt that for this use. That would be better.
>>>>>>>> 3. Double check locking in ClassLoaderData::dictionary() -- someone else should look at this :-)
>>>>>>> I copied code that David Holmes added for modules.
>>>>>>>> 4. We may need a better strategy for deciding the size of each dictionary.
>>>>>>>> 565 const int  _primelist[10] = {1, 107, 1009};
>>>>>>>> 571 Dictionary* ClassLoaderData::dictionary() {
>>>>>>>> 579     if ((dictionary = _dictionary) == NULL) {
>>>>>>>> 580       int size;
>>>>>>>> 581       if (this == the_null_class_loader_data() || is_system_class_loader_data()) {
>>>>>>>> 582         size = _primelist[2];
>>>>>>>> 583       } else if (class_loader()->is_a(SystemDictionary::reflect_DelegatingClassLoader_klass())) {
>>>>>>>> 584         size = _primelist[0];
>>>>>>>> 585       } else {
>>>>>>>> 586         size = _primelist[1];
>>>>>>>> 587       }
>>>>>>>> 588       dictionary = new Dictionary(this, size);
>>>>>>>> I'll do some investigation on this issue and get back to you.
>>>>>>> How about if someone uses PredictedLoadedClassCount, then we use that to size all but the reflection and boot class loader?  Then if there's an application that has a class loader with a huge amount of classes loaded in it, that would help this?  It might cost some footprint but an oversized table would simply be a bigger array of pointers, so it might not be that bad to oversize.
>>>>>>> I think the long term solution is being able to resize these entries and make the application provide arguments.   Please let me know what you find in your investigation and if that would work.
>>>>>>>> The rest of the changes look good to me.
>>>>>>> Thank you!
>>>>>>> Coleen
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>> On 6/23/17 4:42 PM, coleen.phillimore at oracle.com <mailto:coleen.phillimore at oracle.com> wrote:
>>>>>>>>> Summary: Implement one dictionary per ClassLoaderData for faster lookup and removal during class unloading
>>>>>>>>> See RFE for more details.
>>>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/7133093.01/webrev <http://cr.openjdk.java.net/%7Ecoleenp/7133093.01/webrev>
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-7133093
>>>>>>>>> Tested with full "nightly" run in rbt, plus locally class loading and unloading tests:
>>>>>>>>> jtreg hotspot/test/runtime/ClassUnload
>>>>>>>>> jtreg hotspot/test/runtime/modules
>>>>>>>>> jtreg hotspot/test/gc/class_unloading
>>>>>>>>> make test-hotspot-closed-tonga FILTER=quick TEST_JOBS=4 TEST=vm.parallel_class_loading
>>>>>>>>> csh ~/testing/run_jck9  (vm/lang/java_lang)
>>>>>>>>> runThese -jck - uses class loader isolation to run each jck test and unloads tests when done (at -gc:5 intervals)
>>>>>>>>> Thanks,
>>>>>>>>> Coleen

More information about the hotspot-dev mailing list