(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Jeremy Manson jeremymanson at google.com
Sat Nov 7 01:22:24 UTC 2015

On Thu, Nov 5, 2015 at 11:48 PM, David Holmes <david.holmes at oracle.com>

> On 6/11/2015 4:55 PM, Jeremy Manson wrote:
>> FWIW, Google tried to convince the glibc maintainers to make this
>> async-safe, but they weren't biting:
>> https://sourceware.org/ml/libc-alpha/2014-01/msg00033.html
> Yes I read all that. I wouldn't say they weren't biting, more of a
> disagreement on the right direction for the patch. glibc weren't prepared
> to take it directly as is, while google-folk didn't seem to think it worth
> their while to do whatever glibc folk wanted. The actual patch proposal
> just died out. :( Quite a pity.
> Most of the things you can do are going to be mitigation rather than a
>> fix.  I did what you suggest to mitigate, and no one complained, until
>> someone at Google started a sidecar C++ thread that did a boatload of
>> malloc'ing.
> Yes all mitigation. :(
> My workaround was far sneakier, and fixes the problem entirely, but you
>> probably won't want to do it for code hygiene reasons.  I declare the
>> __thread variable in the java launcher itself, and then export a
>> function that provides a pointer to it.  In practice, in glibc, if it is
>> in the main executable, ELF is smart enough to declare it as part of the
>> stack, and is therefore async-safe.
> But even that only works for our own launchers - not for embedded in the
> JVM. :(

Yup.  Fortunately, I can tell people at Google how to write launchers.

> Fortunately, while this is a fun thing to think about, I don't think
>> there are any async paths in the JVM that use Thread::current()
>> (although I could be wrong - there is a comment in there saying not to
>> call Thread::current() in a signal handler...).  I would check the call
>> paths in AsyncGetCallTrace to make sure.
> So two things ...
> First, using Thread::current() in a signal context was disallowed, but the
> alternative was ThreadLocalStorage::get_thread_slow(). The former may not
> work in a signal context due to the caching mechanisms layered in on
> different platforms, while the latter used the platform TLS API which, even
> if not guaranteed, worked well enough in a signal context. With __thread we
> don't have even a pretend signal-safe alternative :(


> Second, AsyncGetCallTrace violates the first rule by using
> JavaThread::current() in an assertion.

While we're on the subject, the assertion in Method::bci_from is reachable
from AsyncGetCallTrace and calls err_msg, which can malloc().  I meant to
file a bug about that.

Also, the problem may not be limited to something like AsyncGetCallTrace.
> Though agents may get the current thread from the JNIEnv rather than
> invoking some JVM function that results in Thread::current() being used, I
> can't be sure of that.

Which JVM functions that get the thread are supposed to be async-safe?
There is no reason to think that any method that isn't explicitly marked
async-safe is async safe, and most JNI methods I've tried to use from
signal handlers die painfully if I try to use them from a signal handler.

Generally, I don't think it is reasonable for a user to expect async-safety
from an API that isn't expressly designed that way.  POSIX has a list of
async-safe methods (signal(7)).

FWIW, to use AsyncGetCallTrace, I get the JNIEnv in a ThreadStart hook from
JVMTI and stash it in a __thread (and pull the trick I mentioned).


> Anyway more things to mull over on the weekedn. :)
> Thanks,
> David
> Jeremy
>> On Thu, Nov 5, 2015 at 10:26 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>     Hi Jeremy,
>>     Okay I have read:
>>     https://sourceware.org/glibc/wiki/TLSandSignals
>>     and the tree of mail postings referenced therefrom - great reading! :)
>>     So basic problem: access to __thread variables is not
>> async-signal-safe
>>     Exacerbator to problem: first attempt to even read a __thread
>>     variable can lead to allocation which is the real problem in
>>     relation to async-signal-safety
>>     I mention the exacerbator because pthread_getspecific and
>>     pthread_setSpecific are also not async-signal-safe but we already
>>     use them. However, pthread_getspecific is in fact (per email threads
>>     linked above) effectively async-signal-safe, and further a call to
>>     pthread_getspecific never results in a call to pthread_setspecific
>>     or an allocation. Hence the pthread functions are almost, if not
>>     completely, safe in practice with reasonable uses (ie only read from
>>     signal handler). Which explain this code in existing Thread::current()
>>     #ifdef PARANOID
>>        // Signal handler should call ThreadLocalStorage::get_thread_slow()
>>        Thread* t = ThreadLocalStorage::get_thread_slow();
>>        assert(t != NULL && !t->is_inside_signal_handler(),
>>               "Don't use Thread::current() inside signal handler");
>>     #endif
>>     So problem scenario is: use of __thread variable (that belongs to
>>     the shared-library) in a signal handler.
>>     Solution 0: don't do that. Seriously - like any other
>>     async-signal-unsafe stuff we should not be using it in real signal
>>     handlers. The crash handler is a different matter - we try all sorts
>>     there because it might work and you can't die twice.
>>     Otherwise: narrow the window of exposure.
>>     1. We ensure we initialize thread_current (even with a dummy value)
>>     as early as possible in the thread that loads libjvm. As we have no
>>     signal handlers installed at that point that might use the same
>>     variable, we can not hit the problem scenario.
>>     2. We ensure we initialize thread_current in a new thread with all
>>     signals blocked. This again avoids the problem scenario.
>>     3. We initialize thread_current in an attaching thread as soon as
>>     possible and we again first block all signals.
>>     That still leaves the problem of an unattached native thread taking
>>     a signal whilst in async-signal-unsafe code, and executing a signal
>>     handler which in turns tries to access thread_current for the first
>>     time. This signal handler need not be an actual JVM handler, but one
>>     attached by other native code eg an agent. I'm not clear in the
>>     latter case how reasonable it is for an agent's handler to try and
>>     do things from an unattached thread - and we don't claim any JNI
>>     interfaces can, or should, be called from a signal handler - but it
>>     is something you can probably get away with today.
>>     Let me also point out that we already effectively have this code in
>>     Solaris already (at the ThreadLocalStorage class level). So if there
>>     is something here that will prevent the current proposal we already
>>     have a problem on Solaris. :(
>>     Thoughts/comments/suggestions?
>>     Thanks,
>>     David
>>     On 6/11/2015 1:09 PM, David Holmes wrote:
>>         Hi Jeremy,
>>         I was going to ask you to elaborate :)
>>         On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>>             I should probably elaborate on this.  With glibc + ELF, the
>>             first time a
>>             thread accesses a variable declared __thread, if that
>>             variable is in a
>>             shared library (as opposed to the main executable), the
>>             system calls
>>             malloc() to allocate the space for it.  If that happens in a
>>             signal that
>>             is being delivered during a call to malloc(), then you
>>             usually get a
>>             crash.
>>         My understanding of the ELF ABI for thread-locals - which I read
>>         about
>>         in the Solaris 11.1 Linkers and libraries guide - does require
>>         use of
>>         the dynamic TLS model for any dynamically loaded shared object
>> which
>>         defines a thread-local, but that is what we use as I understand
>>         it. The
>>         docs state:
>>         "A shared object containing only dynamic TLS can be loaded
>> following
>>         process startup without limitations. The runtime linker extends
>>         the list
>>         of initialization records to include the initialization template
>>         of the
>>         new object. The new object is given an index of m = M + 1. The
>>         counter M is incremented by 1. However, the allocation of new
>>         TLS blocks
>>         is deferred until the blocks are actually referenced."
>>         Now I guess "extends the list" might be implemented using malloc
>>         ... but
>>         this will only occur in the main thread (the one started by the
>>         launcher
>>         to load the JVM and become the main thread), at the time libjvm is
>>         loaded - which will all be over before any agent etc can run and
>> do
>>         anything. But "allocation ... is deferred" suggests we may have a
>>         problem until either the first call to Thread::current or the
>>         call to
>>         Thread::initialize_thread_current. If it is the former then that
>>         should
>>         occur well before any agent etc can be loaded. And I can easily
>>         inject
>>         an initial dummy call to initialize_thread_current(null) to
>>         force the
>>         TLS allocation.
>>             This may bite you if AsyncGetCallTrace uses
>>             Thread::current(), and you
>>             use system timers to do profiling.  If a thread doing a
>>             malloc() prior
>>             to the first time it accesses Thread::current(), and it gets
>>             delivered a
>>             signal, it might die.  This is especially likely for pure
>>             native threads
>>             started by native code.
>>             I believe that this is a use case you support, so you might
>>             want to make
>>             sure it is okay.
>>         For a VM embedded in a process, which already contains native
>>         threads,
>>         that will later attach to the VM, this may indeed be a problem.
>> One
>>         would have hoped however that the implementation of TLS would be
>>         completely robust, at least for something as simple as getting a
>>         signal
>>         whilst in the allocator.
>>         I'm unclear how to test for or check for this kind of problem.
>>         Arguably
>>         there could be many things that are async-unsafe in this way.
>>         Need to think more about this and do some more research. Would
>> also
>>         appreciate any insight from any glibc and/or ELF gurus.
>>         Thanks.
>>         David
>>             Jeremy
>>             On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson
>>             <jeremymanson at google.com <mailto:jeremymanson at google.com>
>>             <mailto:jeremymanson at google.com
>>             <mailto:jeremymanson at google.com>>> wrote:
>>                  Something that's bitten me with __thread: it isn't
>>             async-safe when
>>                  called from a shared object on Linux.  Have you vetted
>>             to make sure
>>                  this doesn't make HS less async-safe?
>>                  Jeremy
>>                  On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>>                  <david.holmes at oracle.com
>>             <mailto:david.holmes at oracle.com>
>>             <mailto:david.holmes at oracle.com
>>             <mailto:david.holmes at oracle.com>>> wrote:
>>                      bug:
>> https://bugs.openjdk.java.net/browse/JDK-8132510
>>                      Open webrev:
>>             http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>>                      A simple (in principle) but wide-ranging change
>>             which should
>>                      appeal to our Code Deletion Engineer's. We implement
>>                      Thread::current() using a compiler/language-based
>>             thread-local
>>                      variable eg:
>>                        static __thread Thread *_thr_current;
>>                        inline Thread* Thread::current() {
>>                          return _thr_current;
>>                        }
>>                      with an appropriate setter of course. By doing this
>>             we can
>>                      completely remove the platform-specific
>>             ThreadLocalStorage
>>                      implementations, and the associated
>>             os::thread_local_storage*
>>                      calls, plus all the uses of
>>             ThreadLocalStorage::thread() and
>>                      ThreadLocalStorage::get_thread_slow(). This extends
>>             the previous
>>                      work done on Solaris to implement
>>             ThreadLocalStorage::thread()
>>                      using compiler-based thread-locals.
>>                      We can also consolidate nearly all the os_cpu
>>             versions of
>>                      MacroAssembler::get_thread on x86 into one cpu
>>             specific one ( a
>>                      special variant is still needed for 32-bit Windows).
>>                      As a result of this change we have further
>>             potential cleanups:
>>                      - all the src/os/<os>/vm/thread_<os>.inline.hpp
>>             files are now
>>                      completely empty and could also be removed
>>                      - the MINIMIZE_RAM_USAGE define (which avoids use
>>             of the linux
>>                      sp-map "cache" on 32-bit) now has no affect and so
>>             could be
>>                      completely removed from the build system
>>                      I plan to do the MINIMIZE_RAM_USAGE removal as a
>>             follow up CR,
>>                      but could add the removal of the "inline" files to
>>             this CR if
>>                      people think it worth removing them.
>>                      I have one missing piece on Aarch64 - I need to
>> change
>>                      MacroAssembler::get_thread to simply call
>>             Thread::current() as
>>                      on other platforms, but I don't know how to write
>>             that. I would
>>                      appreciate it if someone could give me the right
>>             code for that.
>>                      I would also appreciate comments/testing by the AIX
>>             and PPC64
>>                      folk as well.
>>                      A concern about memory-leaks had previously been
>>             raised, but
>>                      experiments using simple C code on linux 86 and
>>             Solaris showed
>>                      no issues. Also note that Aarch64 already uses this
>>             kind of
>>                      thread-local.
>>                      Thanks,
>>                      David

More information about the hotspot-dev mailing list