RFR (L) JDK-8195100: Use a low latency hashtable for SymbolTable

Kim Barrett kim.barrett at oracle.com
Tue Aug 7 02:18:34 UTC 2018

> On Aug 6, 2018, at 12:39 PM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
> Thank you Kim very much for such thorough and detailed feedback!
> New webrev at http://cr.openjdk.java.net/~gziemski/8195100_rev5
> rev5 vs rev4 at http://cr.openjdk.java.net/~gziemski/8195100_rev5_4

Replies and further comments inline.

(And thank you for the incremental webrev.)

>> On Aug 2, 2018, at 6:16 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> src/hotspot/share/classfile/symbolTable.cpp
>> +#define PREF_AVG_LIST_LEN           8
>> +#define END_SIZE                    17
>> +#define REHASH_LEN                  100
>> +#define ON_STACK_BUFFER_LENGTH 128
>> Why are these macros, rather than constants?
> There are pro and cons for each and we could also consider "enum {PREF_AVG_LIST_LEN = 8;}” here.
> I copied the code from stringTable.cpp and I’m not sure I see a really good reason to change both files. Do you have a strong opinion here?

Using macros for constants is commonly considered poor style. The
HotSpot style guide says "You can almost always use an inline function
or class instead of a macro. Use a macro only when you really need
it." While this doesn't explicitly mention object macros, the second
sentence certainly applies.

Using enums is not a good alternative: JDK-8153116.

That said, I won't object to making this a cleanup item for later,
esp. since it also applies to StringTable.

>> src/hotspot/share/classfile/symbolTable.cpp
>> +static inline void _log_trace_symboltable_helper(Symbol* sym, const char* msg) {
>> +#ifndef PRODUCT
>> +  if (log_is_enabled(Trace, symboltable)) {
>> +    char *s;
>> +    {
>> +      ResourceMark rm;
>> +      s = sym->as_quoted_ascii();
>> +      s = os::strdup(s);
>> +    }
>> +    if (s == NULL) {
>> +      log_trace(symboltable)("%s [%s]", msg, "NULL");
>> +    } else {
>> +      log_trace(symboltable)("%s [%s]", msg, s);
>> +      os::free(s);
>> +    }
>> +  }
>> +#endif // PRODUCT
>> +}
>> Why the strdup/free here?  Just log the result of as_quoted_ascii
>> directly (assuming it's not NULL).
> Fixed.

Lost in the rewrite was the necessary ResourceMark for as_quoted_ascii.

Also, the function name shouldn't have a leading underscore, per style

>> src/hotspot/share/classfile/symbolTable.cpp 
>> +class SymbolTableConfig : public SymbolTableHash::BaseConfig {
>> I think there's an underlying design bug here.  But I guess go with
>> what you've got, and I may suggest some changes later, in conjunction
>> with changes to ConcurrentHashTable.
> OK, you will file a followup issue?


>> src/hotspot/share/classfile/symbolTable.cpp 
>> +static size_t ceil_pow_2(uintx value) {
>> +  size_t ret;
>> +  for (ret = 1; ((size_t)1 << ret) < value; ++ret);
>> +  return ret;
>> +}
>> This is poorly named; it's really a variant of integer log2.
> How about ceil_log2 then? I’d like the “ceil” be part of the name to signify that we will only return an integer, such that:
> 2^ceil_log2(value) >= value.
> I’ll also make same modification in stringTable.cpp

The suggested ceil_log2 is a better name than the actually changed to
log2_ceil, which I read as the log2 of the ceil of x.

>> src/hotspot/share/classfile/symbolTable.cpp 
>> +void SymbolTable::delete_symbol(Symbol* sym) {
>> +  if (sym->refcount() == PERM_REFCOUNT) {
>> +    MutexLocker ml(SymbolTable_lock); // Protect arena
>> +    // Deleting permanent symbol should not occur very often (insert race condition),
>> +    // so log it.
>> +    _log_trace_symboltable_helper(sym, "Freeing permanent symbol");
>> +    if (!arena()->Afree(sym, sym->size())) {
>> +      _log_trace_symboltable_helper(sym, "Leaked permanent symbol");
>> +    }
>> +  } else {
>> +    delete sym;
>> +  }
>> +}
>> I'm surprised there isn't an assert that sym->refcount() == 0 in the
>> else clause.  That would also obviate the need for the assert in the
>> only caller (SymbolTableConfig::free_node).
> I like the assert in SymbolTableConfig::free_node better, which also handles "sym->refcount() == PERM_REFCOUNT", so when we get to "SymbolTable::delete_symbol” we know we’re good.

Sorry, but I'm not convinced. I think it's better to have the assert
near the place that actually cares, which is delete_symbol. I think
the assert check for PERM_REFCOUNT is just redundant. I also think the
associated comment would be better placed in delete_symbol.

>> src/hotspot/share/classfile/symbolTable.hpp
>> 121   static bool _lookup_shared_first;
>> This variable is read and written by multiple threads concurrently, so
>> ought to be volatile.
> Fixed. I did the same for “_alt_hash” flag.
> I’ll also make same modification in stringTable.cpp

_alt_hash is not subject to concurrent modification, which is a good
thing because there is lots of code that is not set up to cope with
that possibility.  It is only set by rehash_table, which is only
called from safepoint cleanup.

Hm, but looking at that makes me nervous. We're running various
cleanup tasks in parallel. What happens if one of the other parallel
tasks needs to look up a symbol? That seems like it would lead to bad
things if it can happen. And it's not entirely obvious to me that it
can't happen.  Similarly for stringtable.

Given the way it's used, it seems like it should be a StringTable
ordinary member rather than a static member, and the new table
produced by rehashing (that wants it turned on) should be constructed
with it being true.

>> src/hotspot/share/classfile/symbolTable.cpp
>> SymbolTable::lookup(const Symbol*, int, int, TRAPS)
>> There is also a comment for this overload in the header that is no
>> longer relevant.
> All fixed.

The comment is still there.

>> src/hotspot/share/classfile/symbolTable.hpp
>> 126   volatile int _items;
>> 127   DEFINE_PAD_MINUS_SIZE(1, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>> 128   volatile int _uncleaned_items;
>> 129   DEFINE_PAD_MINUS_SIZE(2, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
>> I'm not entirely sure what the intent of the padding here is, but it
>> looks strange to me.  Also, do you have measurements that suggest
>> padding matters?  And why aren't other members that are modified by
>> multiple threads concurrently not padded?
> I copied that code from stringTable.hpp and at the time I was the one who asked Robbin whether the volatile fields needed padding, which seemed to be a pattern used in hotspot, so now we have it. Now, however, that I’m more familiar with the code, I’d rather have no padding and add it back only as needed and determined by the work in JDK-8208142


>> src/hotspot/share/classfile/symbolTable.hpp
>> I find the naming confusing that lookup() does find or insert, but
>> helper functions like lookup_dynamic just find.
> Me too, I intend on cleaning this more as part of JDK-8208142


>> src/hotspot/share/classfile/symbolTable.hpp
>> 181   size_t table_size(Thread* thread = NULL);
>> No caller passes a thread argument.  I think the argument exists only
>> because in the implementation the thread is needed for RCU
>> protection.  That's an implementation detail that shouldn't propogate
>> out to the API here.  So I think the argument should be eliminated.
> SymbolTable::grow(JavaThread* jt) uses it:
> 678:  _current_size = table_size(jt);

Yeah, I forgot about that one call.  But it doesn't matter for my
argument; the elimination of the thread argument there would make no
performance difference and makes the API cleaner.

>> src/hotspot/share/classfile/symbolTable.cpp 
>> 478     _created = SymbolTable::the_table()->allocate_symbol((const u1*)_name, _len, _heap, _thread);
>> [pre-existing] Why does the Symbol take a const u1* name argument?
>> Especially since that's not how it's used?  It should probably be
>> const char*.
> Agree, I prefer symbolTable APIs to consistently use “const char *” throughout and only cast to (const u1*) when using Symbol APIs.

To me it seems like a bug that the Symbol constructor uses u1 as the
element type of its name argument. I also wonder why it uses jbyte
rather than char as the element type of the body, but maybe there is
some underlying string that would be pulled by changing that. Doing
anything about this could be a followup cleanup RFE.

>> src/hotspot/share/classfile/symbolTable.cpp 
>> 518   bool rehash_warning;
>> 519   bool clean_hint;
>> Please initialize these variables, so a reader doesn't need to go
>> digging through the get_insert_lazy to figure out whether they are
>> alwasy set, or possibly only conditionally set.
>> Similarly for line 387.
> I had to do the exact same exercise myself to make sure they are set inside the concurrent hash table code, and didn’t like that - fixed.


>> src/hotspot/share/utilities/globalCounter.cpp
>> 64   // Handle bootstrap
>> 65   if (Threads::number_of_threads() == 0) {
>> 66     return;
>> 67   }
>> What is this about?  number_of_threads only refers to Java threads,
>> but the iteration should deal with there being none, right?  Or does
>> this get called so early (because of SymbolTable changes) that even
>> that iteration setup doesn't work?
>> If not that, then maybe we're calling before the VMThread exists, and
>> that's the real problem?  In which case a different test, differently
>> located, seems necessary.
> We needed it when I was bringing up the code and I tried removing it to see whether we still need it and it seems that we do (details on the crash in the bug itself). Maybe Robbin can shed more light here?

The problem here is that we're using the StringTable (and therefore
GlobalCounter) before the main thread has been registered.  That
registration happens in create_vm() (Threads::add(main_thread), line
3729), which follows the call to init_globals() (where the symbol
table usage is occurring) in create_vm().

Maybe moving the Threads::add earlier would fix the problem?  But this
initialization code is very ordering sensitive, so I don't know if
that would work.

Is there a bug for this problem with GlobalCounter?

128 SymbolTable::SymbolTable() : _local_table(NULL), _current_size(0), _has_work(0),
 129   _needs_rehashing(false), _items_count(0), _uncleaned_items_count(0),
 130   _symbols_removed(0), _symbols_counted(0) {

The order of the initializers is not declaration order, so this will
fail to compile as soon as Thomas's -Wreorder change is pushed.

 334   SymbolTableLookup(Thread* thread, const char* key, int len, uintx hash)
 335   : _thread(thread), _hash(hash), _str(key), _len(len) {}

Another out of order initializer list.

 727   Atomic::add((size_t)stdc._processed, &_symbols_counted);

Why not make SymbolTableDeleteCheck::_processed by size_t, making this
cast unnecessary?

And why not make SymbolTableDoDelete::_deleted also size_t.

These changes would require making the logging call at the end of
clean_dead_entries use SIZE_FORMAT rather than INT32_FORMAT.

  87   size_t add_items_count_to_clean(size_t ndead);

This name change doesn't seem right.


More information about the hotspot-runtime-dev mailing list