RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

David Holmes david.holmes at oracle.com
Fri May 6 23:38:49 UTC 2016

On 7/05/2016 1:41 AM, Stefan Karlsson wrote:
> On 06/05/16 16:32, David Holmes wrote:
>> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>>> Hi David,
>>> On 06/05/16 15:38, David Holmes wrote:
>>>> Hi Stefan,
>>>> Thanks for taking a look at this.
>>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>>> Hi David,
>>>>> I looked through the GC part of this webrev and I think the change is
>>>>> fine.
>>>>> However, it seems a bit error prone. If we decide to change the
>>>>> code to,
>>>>> for example, terminate the  AbstractGangWorker threads, then we
>>>>> have to
>>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>>> That's why I added the ShouldNotReachHere()'s - if those threads start
>>>> terminating then we will see those hit. Perhaps a comment:
>>>> ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup
>>>> ?
>>> Yes, I would appreciate a comment. Though, when we add new threads, we
>>> need to remember to add the set_thread(NULL) call.
>> Well no, what you would do is manage your new threads in such a way
>> that their run() method can do "delete this" as the last call. Only if
>> you can't do that do you need to think about what termination logic is
>> missing that needs to be done in lieu of the destructor.
> Yes, but this forces every implementer of a Thread:run() function to
> have to think about these kind of requirements.

Absolutely! Everyone who creates a thread should be thinking about its 
lifecycle. It is, in my opinion, the lack of such thinking that has 
gotten things in a mess.

>>>>> Could we instead add a call to
>>>>> ThreadLocalStorage::set_thread(NULL), or
>>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>>> static void *java_start(Thread *thread) {
>>>>> [...]
>>>>>   thread->initialize_thread_current();
>>>>> [...]
>>>>>   // call one more level start routine
>>>>>   thread->run();
>>>>>   ////////// Could we call Thread::clear_thread_current(); here?
>>>> Not easily. For JavaThreads we've already done "delete this" inside
>>>> the run() method, so we'd have to move that into java_start as well,
>>>> but we can only do the delete for JavaThreads not for other threads.
>>>> And we'd also have to change the VMThread and WatcherThread
>>>> termination logic because of the deletes that happen in the
>>>> termination thread - the "this" pointer (thread above) may no longer
>>>> be valid when we want to call clear_current_thread() - which is why we
>>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>>> I agree it would be a lot cleaner to have java_start do:
>>>>   thread->common_initialization();
>>>>   thread->run();
>>>>   thread->common_cleanup();
>>>>   delete thread;
>>>> for all threads, but we'd need a lot of other changes to allow for
>>>> that. Otherwise we would need to note that kind of thread before
>>>> calling run() then switch on the thread type after run() to decide
>>>> what kind of cleanup is necessary and possible. I don't think that
>>>> would be better than just doing the "right" cleanup at the end of the
>>>> run() methods.
>>> I understand that this is a bit messy, and I won't insist that we change
>>> this in this RFR, but without looking at this in much detail it sounds
>>> weird to delete the thread in run(). Couldn't this be solved by
>>> introducing a virtual Thread::post_run() function and do:
>>> virtual void Thread::post_run() {
>>>   clear_thread_current();
>>> }
>>> virtual void JavaThread::post_run() {
>>>   Thread::post_run();
>>>   delete this;
>>> }
>> But again this can't work for the VMThread or WatcherThread as they
>> are deleted from the termination thread and so thread->post_run() may
>> SEGV.** Plus it is only after the fact that you realize not to put
>> "delete this" in Thread::post_run().
> OK, I didn't understand what you meant with "termination thread", but I
> now see the call to VMThread::destroy().
> With that said, I find it odd that VMThread::destroy() deletes the VM
> thread. We already handshake between the VMThread and the "termination
> thread", so why isn't that VMThread::post_run() implemented as:
> virtual void VMThread::post_run() {
>   // signal other threads that VM process is gone
>   {
>     // Note: we must have the _no_safepoint_check_flag. Mutex::lock()
> allows
>     // VM thread to enter any lock at Safepoint as long as its _owner is
>     // If that happens after _terminate_lock->wait() has unset _owner
>     // but before it actually drops the lock and waits, the notification
> below
>     // may get lost and we will have a hang. To avoid this, we need to use
>     // Mutex::lock_without_safepoint_check().
>     MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
>     _terminated = true;
>     _terminate_lock->notify();
>   }
>   Thread::post_run();
>   delete this;
> }
> And then we wouldn't get a SEGV ...

Not on the "delete this", no - but you're missing whole aspect of the 
potential race between the execution of the Thread destructors and the 
tearing down of the VM that happens in the termination thread. In the 
past there was an issue with ResourceMark cleanup that used a stub that 
existed in the CodeCache which was deleted by the terminating thread as 
part of VM shutdown. That particular issue no longer exists but there 
could be other issues. We either have to establish and enforce that 
nothing on the destructor path can encounter an error due to something 
happening on the termination path, or we maintain the current 
conservatism and have the terminating thread synchronously destroy the 
other threads. As I said upfront I chose the latter as the former is 
somewhat error prone and fragile.

> I couldn't find the destructor for the WatchThread, but it seems easy to
> fix that as well.
> I'm probably missing something, but I find it a bit annoying that code
> that should belong to the *Thread:ing system leaks into the
> implementations of *Thread::run().

Wouldn't be an issue if the "threading system" had complete lifecycle 
management of all threads - but it doesn't. Eg GC threads are stashed 
into workgangs and other shared variables that can reference the thread 
objects after they have terminated. Alas this isn't Java code. :)


> Thanks,
> StefanK
>> ** Arguably the best solution to the "thread termination races with VM
>> termination" problem is to not let the threads terminate. The code as
>> it exists today can still have JavaThreads destroying themselves at
>> the same that the VM is terminating and potentially hit the same
>> errors that require us to not allow the VMThread (and now
>> WatcherThread) to delete themselves.
>> Thanks,
>> David
>>> Thanks,
>>> StefanK
>>>> Thanks,
>>>> David
>>>> ------
>>>>>   log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ",
>>>>> pthread
>>>>> id: " UINTX_FORMAT ").",
>>>>>     os::current_thread_id(), (uintx) pthread_self());
>>>>>   return 0;
>>>>> }
>>>>> And get rid of the explicit calls to
>>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>>> Thanks,
>>>>> StefanK
>>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>>> This needs attention from GC and runtime folk please.
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called
>>>>>> before a thread terminates.
>>>>>> Background:
>>>>>> Most system-related threads do not expect to explicitly terminate,
>>>>>> except sometimes as part of VM termination. Such threads don't have
>>>>>> their destructors called, but should.
>>>>>> This omission came to light due to the ThreadLocalStorage changes in
>>>>>> JDK-8132510. As part of that change we deleted the following from the
>>>>>> termination path of the VMThread:
>>>>>>  // Thread destructor usually does this.
>>>>>>  ThreadLocalStorage::set_thread(NULL);
>>>>>> The clearing of TLS seemed irrelevant to the VMThread as it primarily
>>>>>> is used to aid in JNI attach/detach. However Brian Gardner reported:
>>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html
>>>>>> a problem on FreeBSD caused by this change and the interaction with
>>>>>> the POSIX  pthread TLS destructor use introduced by JDK-8033696.
>>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>>> TLS-destructor was called it got into a loop which ran four times (as
>>>>>> happens on Linux) and then prints a warning to the console (which
>>>>>> doesn't happen on Linux).
>>>>>> This indicates we need to restore the:
>>>>>>  ThreadLocalStorage::set_thread(NULL);
>>>>>> but on further consideration it seems to me that this is not confined
>>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>>> invoke the Thread destructor as a thread terminates.
>>>>>> Solution:
>>>>>> Further investigation shows that calling the Thread destructor in the
>>>>>> thread as it terminates is not possible:
>>>>>> - VMThread
>>>>>> This is actually destroyed by the thread that terminates the VM, but
>>>>>> that can happen after it terminates and so we still hit the TLS
>>>>>> problem. The VMThread may be able to destroy itself today but in the
>>>>>> past this was not possible (see existing code comment), and in the
>>>>>> future it may also not be possible - the problem is that the Thread
>>>>>> destructor can interact with other VM subsystems that are
>>>>>> concurrently
>>>>>> being torn down by the thread that is terminating the VM. In the past
>>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>>> fragile we stick with the current scheme but restore the
>>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access
>>>>>> "this" at
>>>>>> that time because it may already have been de-allocated.
>>>>>> - WatcherThread
>>>>>> The WatcherThread is never destroyed today but has the same
>>>>>> problem as
>>>>>> the VMThread. We can call the destructor from the VM termination
>>>>>> thread (and have implemented that), but not from the WatcherThread
>>>>>> itself. So again we just have to restore the
>>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS
>>>>>> problem.
>>>>>> - GC Threads
>>>>>> There are two cases:
>>>>>> a) GC threads that never terminate
>>>>>> For these we don't need to do anything: we can't delete the thread as
>>>>>> it never terminates and we don't hit the TLS problem because it never
>>>>>> terminates. So all we will do here is add some logic to check (in
>>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>>> b) GC threads that can terminate
>>>>>> Despite the fact the threads can terminate, references to those
>>>>>> threads are stored elsewhere (WorkGangs and other places) and are not
>>>>>> cleared as part of the termination process. Those references can be
>>>>>> touched after the thread has terminated so we can not call the
>>>>>> destructor at all. So again all we can do (without some major thread
>>>>>> management reworking) is ensure that
>>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>>> actually terminates
>>>>>> Testing: JPRT
>>>>>>          RBT - runtime nightly tests
>>>>>> Thanks,
>>>>>> David

More information about the hotspot-dev mailing list