RFR(L): 8029075 - String deduplication in G1

Thomas Schatzl thomas.schatzl at oracle.com
Tue Mar 11 12:39:51 UTC 2014

Hi Per,

On Fri, 2014-03-07 at 15:13 +0100, Per Liden wrote:
> Hi,
> Here's an updated webrev based on the feedback from Bengt and Thomas.
> http://cr.openjdk.java.net/~pliden/8029075/webrev.1/

See further below.

> And as suggested, two changes in the initial webrev were broken out into 
> separate CRs, available here:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8036672
> Webrev: http://cr.openjdk.java.net/~pliden/8036672/webrev.0/

Looks good.

Did you check the impact on something like specjbb2005 which does
nothing but object copying? That code is somewhat performance sensitive.

> Bug: https://bugs.openjdk.java.net/browse/JDK-8036673
> Webrev: http://cr.openjdk.java.net/~pliden/8036673/webrev.0/

Looks good.

Here are my comments: there are a few minor issues, and a single major:
basically the documentation for this feature is lacking imo. I listed a
few particular issues of mine with it.

First the minor issues:

- could you check on copyright dates on the modified files? They have
not been updated anywhere afaics.

- not sure if good, but maybe make PrintStringDeduplicationStatistics
diagnostic? It seems to really be some diagnosis functions.

- maybe make the new *ALot flags notproduct. These seem to be debugging
helpers similar to other *ALot flags.

- StringDeduplicationAgeThreshold: maybe also needs to mention that if
the object is promoted, that it is always considered as candidate.


- MarkSweep::mark_object(): did you measure impact on other collectors
using this code? It is enabled unconditionally. Also I would prefer
that !UseG1GC automatically disables UseStringDeduplication (or use a
global variable for that) to avoid the two checks. (The latter option is
probably better).


- again both UseG1GC and UseStringDedupliation are paired. Could this be
put in a method/global variable somewhere that is set up correctly at


- StringDedup::initialize() is run unconditionally (the check for
UseStringDeduplication is inside the method) while it's outside for
everything else. Just a note, keep it if wanted.


- maybe put retrieval of the unsafe object into the test library.
Browsing through the code shows quite a lot copy+paste here already.

- "qeueue" -> "queue"

all tests:

- missing bug ids

>From here it is mostly about comments/documentation:


- is it possible to put an overview about how the string deduplication
works here? There is absolutely no place in the code where e.g. the
string deduplication cycle is explained. There is no comprehensive
information about this feature anywhere, just bits about corner cases or
particular here and there, but nothing that can be used as entry point
to understand what this is actually doing.

To some extent, the comments seem to be repetitions of what the code
already shows. E.g. "some_constant = (1 << 10); // 1024 (must be power
of 2)" or "// Store hash code in cache" followed by
"java_lang_String::set_hash(java_string, hash);".

There is also no reference to the JEP (I am not sure if it is customary
to put references to outside documents - I would prefer some); however
the JEP does not replace documentation in the source as it is not
detailed enough, and the JEP cannot be changed any more. The JEP is
interesting for the alternatives considered though.

Ideally, to reasonably understand the feature I would like to not need
to spend hours looking through the implementation.

- how many threads does this feature add, and are supported by this code
(for which parts)? Synchronization between them?

- the comments "For XY" in the StringDedup header file do not give any
hint about what they are doing (not sure if it is interesting as it is
easy to find out the only callers). It seems that this information has
been put at the place where these methods are invoked (or just somewhere
in the cpp file). Imo the methods and members (except trivial) should be
documented in the hpp file in the class body, i.e. in the interface.

- there are a few methods (also in the StringDedupTable class) in
StringDedup where the results of the methods do not seem obvious and
should be commented.

- I really dislike the location of the "Candidate selection" explanation
within the cpp file. I would not have found it if I were not looking for
some documentation ;) It would probably fit very well in the above
mentioned overview paragraphs.

- please add the static helper functions that implement the actual
policies (is_candidate_from_mark, is_candidate_from_evacuation) to the
StringDedup class (as private static methods) so that they can be found
easily. Static non-class helper functions in cpp files should only be
used for stuff that is really unimportant and not relevant to the code
imo. However these seem to be _the_ central policy methods for string
So it would be nice to have them mentioned in the class/hpp file and
documented there.

- facts about the code are repeated, e.g. that the table size must be a
power of two. This is nice, but has a few drawbacks: the compiler does
not care about comments at all, so it might be useful to put asserts in
these places instead, and second, it's just duplication of text with the
usual problem of getting out of date.

- I am not completely sure why the class has not been called
G1StringDedup. It won't work without G1 anyway and calls G1 internals
all the time. There does not seem to be a way to make it more generic
without considerable refactoring. It is fine to keep the name though,
just asking. I also know that this feature does not want to be a generic

- is it possible to separate out the statistics variables of
StringDedupTable into either a separate class instead of using globals
or separate them a little from the other unrelated members? (And
describe them) This would improve readability.

I *think* this is everything from StringDedupTable::_entries_added to
_rehash_count? I have no idea without searching for every use and
deducing from that.

- when implementing the hash table for the StringDedupTable, did you
consider trying to reuse one of the existing hashmap implementations?
And potentially extend them.

Imo we already have way too many ad-hoc implementations of often used
data structures (particularly) in G1 and GC code.

- please document the properties of the StringDedupTable (type, hashing,
automatic resizing behavior, thread safety, ...).

- please explain what StringDedupTable/Cache are and what they are used
for in context of the feature.

- stringdedup.cpp, line 1309: candicate -> candidate


More information about the hotspot-dev mailing list