(L) Prelim RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

Thomas Stüfe thomas.stuefe at gmail.com
Fri Nov 6 11:52:54 UTC 2015

Hi David,

On Fri, Nov 6, 2015 at 7:26 AM, David Holmes <david.holmes at oracle.com>

> Hi Jeremy,
> Okay I have read:
> https://sourceware.org/glibc/wiki/TLSandSignals
> and the tree of mail postings referenced therefrom - great reading! :)
> So basic problem: access to __thread variables is not async-signal-safe
> Exacerbator to problem: first attempt to even read a __thread variable can
> lead to allocation which is the real problem in relation to
> async-signal-safety
> I mention the exacerbator because pthread_getspecific and
> pthread_setSpecific are also not async-signal-safe but we already use them.
> However, pthread_getspecific is in fact (per email threads linked above)
> effectively async-signal-safe, and further a call to pthread_getspecific
> never results in a call to pthread_setspecific or an allocation. Hence the
> pthread functions are almost, if not completely, safe in practice with
> reasonable uses (ie only read from signal handler). Which explain this code
> in existing Thread::current()
> #ifdef PARANOID
>   // Signal handler should call ThreadLocalStorage::get_thread_slow()
>   Thread* t = ThreadLocalStorage::get_thread_slow();
>   assert(t != NULL && !t->is_inside_signal_handler(),
>          "Don't use Thread::current() inside signal handler");
> #endif
> So problem scenario is: use of __thread variable (that belongs to the
> shared-library) in a signal handler.
> Solution 0: don't do that. Seriously - like any other async-signal-unsafe
> stuff we should not be using it in real signal handlers. The crash handler
> is a different matter - we try all sorts there because it might work and
> you can't die twice.
> Otherwise: narrow the window of exposure.
> 1. We ensure we initialize thread_current (even with a dummy value) as
> early as possible in the thread that loads libjvm. As we have no signal
> handlers installed at that point that might use the same variable, we can
> not hit the problem scenario.
> 2. We ensure we initialize thread_current in a new thread with all signals
> blocked. This again avoids the problem scenario.
> 3. We initialize thread_current in an attaching thread as soon as possible
> and we again first block all signals.
> That still leaves the problem of an unattached native thread taking a
> signal whilst in async-signal-unsafe code, and executing a signal handler
> which in turns tries to access thread_current for the first time. This
> signal handler need not be an actual JVM handler, but one attached by other
> native code eg an agent. I'm not clear in the latter case how reasonable it
> is for an agent's handler to try and do things from an unattached thread -
> and we don't claim any JNI interfaces can, or should, be called from a
> signal handler - but it is something you can probably get away with today.
> Let me also point out that we already effectively have this code in
> Solaris already (at the ThreadLocalStorage class level). So if there is
> something here that will prevent the current proposal we already have a
> problem on Solaris. :(
> Thoughts/comments/suggestions?
The first problem: thread initializes TLS variable, gets interrupted and
accesses the half-initialized variable from within the signal handler. This
could happen today too, or? but I think we never saw this.

In theory, it could be mitigated by some careful testing before using the
Thread::current() value in the signal handler. Like, put an eyecatcher at
the beginning of the Thread structure and check that using SafeFetch.

As for the second problem - recursive malloc() deadlocks - I am at a loss.
I do not fully understand though why pthread_getspecific is different -
does it not have to allocate place for the TLS variable too?

Regards, Thomas

> Thanks,
> David
> On 6/11/2015 1:09 PM, David Holmes wrote:
>> Hi Jeremy,
>> I was going to ask you to elaborate :)
>> On 6/11/2015 12:24 PM, Jeremy Manson wrote:
>>> I should probably elaborate on this.  With glibc + ELF, the first time a
>>> thread accesses a variable declared __thread, if that variable is in a
>>> shared library (as opposed to the main executable), the system calls
>>> malloc() to allocate the space for it.  If that happens in a signal that
>>> is being delivered during a call to malloc(), then you usually get a
>>> crash.
>> My understanding of the ELF ABI for thread-locals - which I read about
>> in the Solaris 11.1 Linkers and libraries guide - does require use of
>> the dynamic TLS model for any dynamically loaded shared object which
>> defines a thread-local, but that is what we use as I understand it. The
>> docs state:
>> "A shared object containing only dynamic TLS can be loaded following
>> process startup without limitations. The runtime linker extends the list
>> of initialization records to include the initialization template of the
>> new object. The new object is given an index of m = M + 1. The
>> counter M is incremented by 1. However, the allocation of new TLS blocks
>> is deferred until the blocks are actually referenced."
>> Now I guess "extends the list" might be implemented using malloc ... but
>> this will only occur in the main thread (the one started by the launcher
>> to load the JVM and become the main thread), at the time libjvm is
>> loaded - which will all be over before any agent etc can run and do
>> anything. But "allocation ... is deferred" suggests we may have a
>> problem until either the first call to Thread::current or the call to
>> Thread::initialize_thread_current. If it is the former then that should
>> occur well before any agent etc can be loaded. And I can easily inject
>> an initial dummy call to initialize_thread_current(null) to force the
>> TLS allocation.
>> This may bite you if AsyncGetCallTrace uses Thread::current(), and you
>>> use system timers to do profiling.  If a thread doing a malloc() prior
>>> to the first time it accesses Thread::current(), and it gets delivered a
>>> signal, it might die.  This is especially likely for pure native threads
>>> started by native code.
>>> I believe that this is a use case you support, so you might want to make
>>> sure it is okay.
>> For a VM embedded in a process, which already contains native threads,
>> that will later attach to the VM, this may indeed be a problem. One
>> would have hoped however that the implementation of TLS would be
>> completely robust, at least for something as simple as getting a signal
>> whilst in the allocator.
>> I'm unclear how to test for or check for this kind of problem. Arguably
>> there could be many things that are async-unsafe in this way.
>> Need to think more about this and do some more research. Would also
>> appreciate any insight from any glibc and/or ELF gurus.
>> Thanks.
>> David
>> Jeremy
>>> On Thu, Nov 5, 2015 at 5:58 PM, Jeremy Manson <jeremymanson at google.com
>>> <mailto:jeremymanson at google.com>> wrote:
>>>     Something that's bitten me with __thread: it isn't async-safe when
>>>     called from a shared object on Linux.  Have you vetted to make sure
>>>     this doesn't make HS less async-safe?
>>>     Jeremy
>>>     On Sun, Nov 1, 2015 at 10:40 PM, David Holmes
>>>     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>>         bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>>>         Open webrev:
>>> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v2/
>>>         A simple (in principle) but wide-ranging change which should
>>>         appeal to our Code Deletion Engineer's. We implement
>>>         Thread::current() using a compiler/language-based thread-local
>>>         variable eg:
>>>           static __thread Thread *_thr_current;
>>>           inline Thread* Thread::current() {
>>>             return _thr_current;
>>>           }
>>>         with an appropriate setter of course. By doing this we can
>>>         completely remove the platform-specific ThreadLocalStorage
>>>         implementations, and the associated os::thread_local_storage*
>>>         calls, plus all the uses of ThreadLocalStorage::thread() and
>>>         ThreadLocalStorage::get_thread_slow(). This extends the previous
>>>         work done on Solaris to implement ThreadLocalStorage::thread()
>>>         using compiler-based thread-locals.
>>>         We can also consolidate nearly all the os_cpu versions of
>>>         MacroAssembler::get_thread on x86 into one cpu specific one ( a
>>>         special variant is still needed for 32-bit Windows).
>>>         As a result of this change we have further potential cleanups:
>>>         - all the src/os/<os>/vm/thread_<os>.inline.hpp files are now
>>>         completely empty and could also be removed
>>>         - the MINIMIZE_RAM_USAGE define (which avoids use of the linux
>>>         sp-map "cache" on 32-bit) now has no affect and so could be
>>>         completely removed from the build system
>>>         I plan to do the MINIMIZE_RAM_USAGE removal as a follow up CR,
>>>         but could add the removal of the "inline" files to this CR if
>>>         people think it worth removing them.
>>>         I have one missing piece on Aarch64 - I need to change
>>>         MacroAssembler::get_thread to simply call Thread::current() as
>>>         on other platforms, but I don't know how to write that. I would
>>>         appreciate it if someone could give me the right code for that.
>>>         I would also appreciate comments/testing by the AIX and PPC64
>>>         folk as well.
>>>         A concern about memory-leaks had previously been raised, but
>>>         experiments using simple C code on linux 86 and Solaris showed
>>>         no issues. Also note that Aarch64 already uses this kind of
>>>         thread-local.
>>>         Thanks,
>>>         David

More information about the hotspot-dev mailing list