RFR: 8132510: Replace ThreadLocalStorage with compiler/language-based thread-local variables

David Holmes david.holmes at oracle.com
Mon Nov 23 12:27:04 UTC 2015

Hi Bertrand,

On 23/11/2015 8:14 PM, Bertrand Delsart wrote:
> Hi David,
> Overall looks good.
> Thanks for the #ifndef USE_LIBRARY_BASED_TLS_ONLY :-)
> One doubt, in case this has not been discussed before.
> I'm still catching up on x86_64 (after years of ARM assembly :-)) but it
> seems that there are some stack alignment constraints on runtime calls,
> at least for some x86_64 ABIs.
> Some of the x86 MacroAssembler::get_thread implementations had code to
> align the stack before calling pthread_getspecific. See for instance
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/os_cpu/linux_x86/vm/assembler_linux_x86.cpp.udiff.html

Sorry I'm not that familiar with the MacroAssembler - is it this odd 

-   push(r10);
-   // XXX
-   mov(r10, rsp);
-   andq(rsp, -16);
-   push(r10);

I'm not at all clear what that is doing - and if it somehow changes the 
alignment wouldn't something need to be fixed up when popping the 
previous values ??

To be honest I'm not even sure what an "unaligned stack" means.

> This alignment is no longer performed in the new (shared) implementation
> http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/src/cpu/x86/vm/macroAssembler_x86.cpp.udiff.html
> Now, Solaris was not performing the alignment and Windows has a separate
> path for x86_64. Did we really need the alignment for linux x86_64 and
> bsd_x86_64 ? Might it be needed for other ports ?
> IMHO, it might be safer to align the stack by default, knowing it should
> not be expensive since we call get_thread rarely for x86_64 (result is
> cached in r15). I'll let you see whether it is worth adding an ifdef so
> as to explicitly deactivate the alignment on some platforms
> (solaris_x86_64 ?)

I don't have enough knowledge to even begin to comment on this. I will 
have to rely on our x86_64 macroassembler experts to explain the old 
code and what the requirements are.


> Regards,
> Bertrand.
> On 23/11/2015 08:03, David Holmes wrote:
>> After all the preliminary discussions here are final proposed changes:
>> bug: https://bugs.openjdk.java.net/browse/JDK-8132510
>> Open webrev: http://cr.openjdk.java.net/~dholmes/8132510/webrev.v6/
>> A simple (in principle) but wide-ranging change which should appeal to
>> our Code Deletion Engineer's. We implement Thread::current() using a
>> compiler/language-based thread-local variable eg:
>>   static __thread Thread *_thr_current;
>>   inline Thread* Thread::current() {
>>     return _thr_current;
>>   }
>> with an appropriate setter of course. By doing this we can completely
>> remove the os_cpu-specific ThreadLocalStorage implementations, and the
>> associated os::thread_local_storage* calls.
>> As __thread is not async-signal-safe (either in theory or practice) we
>> need a fallback library-based solution as used today. For this we use a
>> very simple ThreadLocalStorage class and an implementation thereof for
>> POSIX (covers AIX, BSD, Linux, OSX and Solaris) using
>> pthread_get/setspecific; and one for Windows using its TLS library.
>> While these library routines are not guaranteed async-signal-safe, they
>> seem to be in practice and are what we have been using all along.
>> We also allow for use of only the library-based TLS for platforms where
>> compiler-based thread locals are not supported (as will be needed in the
>> Mobile project). This is enabled at build time by defining
>> Thanks to Andrew Haley for providing the Aarch64 code; and for Thomas
>> Stuefe for testing PPC and AIX.
>> Testing:
>>   - JPRT (core platforms)
>>   - Jtreg tests (linux & solaris)
>>   - vm.runtime (core platforms)
>> Performance:
>>   - still TBD - this is proving to be extremely difficult. If anyone has
>> any simple to run microbenchmarks to suggest I'd give them a try as a
>> sanity check. But I lack access to hardware for running serious
>> benchmarking.
>> Footprint:
>> - varies by platform and the VM (server, client, minimal)
>> - worst-case: ~1% increase for server VM and minimal VM
>> - best-case:  0.4% decrease for client VM
>> Thanks,
>> David

More information about the hotspot-dev mailing list