RFR(L): 8029075 - String deduplication in G1

Per Liden per.liden at oracle.com
Wed Mar 12 13:34:38 UTC 2014

Hi Thomas,

Thanks for reviewing. Comments inline.

On 03/11/2014 01:39 PM, Thomas Schatzl wrote:
> Hi Per,
> On Fri, 2014-03-07 at 15:13 +0100, Per Liden wrote:
>> Hi,
>> Here's an updated webrev based on the feedback from Bengt and Thomas.
>> http://cr.openjdk.java.net/~pliden/8029075/webrev.1/
> See further below.
>> And as suggested, two changes in the initial webrev were broken out into
>> separate CRs, available here:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036672
>> Webrev: http://cr.openjdk.java.net/~pliden/8036672/webrev.0/
> Looks good.
> Did you check the impact on something like specjbb2005 which does
> nothing but object copying? That code is somewhat performance sensitive.

Yes, I did Aurora runs with jbb2005 and jbb2013, didn't see any difference.

>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036673
>> Webrev: http://cr.openjdk.java.net/~pliden/8036673/webrev.0/
> Looks good.
> Here are my comments: there are a few minor issues, and a single major:
> basically the documentation for this feature is lacking imo. I listed a
> few particular issues of mine with it.
> First the minor issues:
> - could you check on copyright dates on the modified files? They have
> not been updated anywhere afaics.
> - not sure if good, but maybe make PrintStringDeduplicationStatistics
> diagnostic? It seems to really be some diagnosis functions.
> - maybe make the new *ALot flags notproduct. These seem to be debugging
> helpers similar to other *ALot flags.

I had a conversion with Bengt about this some time ago and I think the 
options are categorized correctly. The ALot flags needs be available in 
product builds because they are used by tests. Most Print*Statistics are 
product therefore it makes sense to follow that pattern for 

> - StringDeduplicationAgeThreshold: maybe also needs to mention that if
> the object is promoted, that it is always considered as candidate.


> marksweep.inline.hpp
> - MarkSweep::mark_object(): did you measure impact on other collectors
> using this code? It is enabled unconditionally. Also I would prefer
> that !UseG1GC automatically disables UseStringDeduplication (or use a
> global variable for that) to avoid the two checks. (The latter option is
> probably better).

Yes, ran jbb2005/2013.

> symbolTable.cpp:
> - again both UseG1GC and UseStringDedupliation are paired. Could this be
> put in a method/global variable somewhere that is set up correctly at
> startup?

I think using the options is much more readable. Adding another state 
which tells us if G1 & Dedup is enabled means we need a pre-init stage 
to set up that state, which I don't think is very nice.

> g1CollectedHeap.cpp:
> - StringDedup::initialize() is run unconditionally (the check for
> UseStringDeduplication is inside the method) while it's outside for
> everything else. Just a note, keep it if wanted.

They are hidden inside dedup everywhere, except where the performance 
overhead of a call might be an issue (like mark and copy).

> TestStringDeduplicationTools.java:
> - maybe put retrieval of the unsafe object into the test library.
> Browsing through the code shows quite a lot copy+paste here already.
> - "qeueue" -> "queue"


> all tests:
> - missing bug ids


>  From here it is mostly about comments/documentation:

I agree that adding a feature overview and some additional comments is 
probably a good idea. However, I also know that Bengt and StefanK aren't 
too crazy about large comment blobs so I tried to kept the noise down. I 
also disagree with some of the other comments below which tend to be 
about personal taste rather than code quality or breaking some coding 
convention we have, e.g. the static helpers, readability, naming, etc.

I'll add some comments and make some other changes and send out an 
updated webrev shortly.


> stringdedup.?pp:
> - is it possible to put an overview about how the string deduplication
> works here? There is absolutely no place in the code where e.g. the
> string deduplication cycle is explained. There is no comprehensive
> information about this feature anywhere, just bits about corner cases or
> particular here and there, but nothing that can be used as entry point
> to understand what this is actually doing.
> To some extent, the comments seem to be repetitions of what the code
> already shows. E.g. "some_constant = (1 << 10); // 1024 (must be power
> of 2)" or "// Store hash code in cache" followed by
> "java_lang_String::set_hash(java_string, hash);".
> There is also no reference to the JEP (I am not sure if it is customary
> to put references to outside documents - I would prefer some); however
> the JEP does not replace documentation in the source as it is not
> detailed enough, and the JEP cannot be changed any more. The JEP is
> interesting for the alternatives considered though.
> Ideally, to reasonably understand the feature I would like to not need
> to spend hours looking through the implementation.
> - how many threads does this feature add, and are supported by this code
> (for which parts)? Synchronization between them?
> - the comments "For XY" in the StringDedup header file do not give any
> hint about what they are doing (not sure if it is interesting as it is
> easy to find out the only callers). It seems that this information has
> been put at the place where these methods are invoked (or just somewhere
> in the cpp file). Imo the methods and members (except trivial) should be
> documented in the hpp file in the class body, i.e. in the interface.
> - there are a few methods (also in the StringDedupTable class) in
> StringDedup where the results of the methods do not seem obvious and
> should be commented.
> - I really dislike the location of the "Candidate selection" explanation
> within the cpp file. I would not have found it if I were not looking for
> some documentation ;) It would probably fit very well in the above
> mentioned overview paragraphs.
> - please add the static helper functions that implement the actual
> policies (is_candidate_from_mark, is_candidate_from_evacuation) to the
> StringDedup class (as private static methods) so that they can be found
> easily. Static non-class helper functions in cpp files should only be
> used for stuff that is really unimportant and not relevant to the code
> imo. However these seem to be _the_ central policy methods for string
> dedup.
> So it would be nice to have them mentioned in the class/hpp file and
> documented there.
> - facts about the code are repeated, e.g. that the table size must be a
> power of two. This is nice, but has a few drawbacks: the compiler does
> not care about comments at all, so it might be useful to put asserts in
> these places instead, and second, it's just duplication of text with the
> usual problem of getting out of date.
> - I am not completely sure why the class has not been called
> G1StringDedup. It won't work without G1 anyway and calls G1 internals
> all the time. There does not seem to be a way to make it more generic
> without considerable refactoring. It is fine to keep the name though,
> just asking. I also know that this feature does not want to be a generic
> solution.
> - is it possible to separate out the statistics variables of
> StringDedupTable into either a separate class instead of using globals
> or separate them a little from the other unrelated members? (And
> describe them) This would improve readability.
> I *think* this is everything from StringDedupTable::_entries_added to
> _rehash_count? I have no idea without searching for every use and
> deducing from that.
> - when implementing the hash table for the StringDedupTable, did you
> consider trying to reuse one of the existing hashmap implementations?
> And potentially extend them.
> Imo we already have way too many ad-hoc implementations of often used
> data structures (particularly) in G1 and GC code.
> - please document the properties of the StringDedupTable (type, hashing,
> automatic resizing behavior, thread safety, ...).
> - please explain what StringDedupTable/Cache are and what they are used
> for in context of the feature.
> - stringdedup.cpp, line 1309: candicate -> candidate
> Thanks,
> Thomas

More information about the hotspot-dev mailing list