RFR: 8154715: Missing destructor and/or TLS clearing calls for terminating threads

Stefan Karlsson stefan.karlsson at oracle.com
Fri May 6 15:41:36 UTC 2016

On 06/05/16 16:32, David Holmes wrote:
> On 7/05/2016 12:04 AM, Stefan Karlsson wrote:
>> Hi David,
>> On 06/05/16 15:38, David Holmes wrote:
>>> Hi Stefan,
>>> Thanks for taking a look at this.
>>> On 6/05/2016 5:02 PM, Stefan Karlsson wrote:
>>>> Hi David,
>>>> I looked through the GC part of this webrev and I think the change is
>>>> fine.
>>>> However, it seems a bit error prone. If we decide to change the 
>>>> code to,
>>>> for example, terminate the  AbstractGangWorker threads, then we 
>>>> have to
>>>> remember to insert a ThreadLocalStorage::set_thread(NULL) call.
>>> That's why I added the ShouldNotReachHere()'s - if those threads start
>>> terminating then we will see those hit. Perhaps a comment:
>>> ShouldNotReachHere(); // If thread terminates we have to do TLS cleanup
>>> ?
>> Yes, I would appreciate a comment. Though, when we add new threads, we
>> need to remember to add the set_thread(NULL) call.
> Well no, what you would do is manage your new threads in such a way 
> that their run() method can do "delete this" as the last call. Only if 
> you can't do that do you need to think about what termination logic is 
> missing that needs to be done in lieu of the destructor.

Yes, but this forces every implementer of a Thread:run() function to 
have to think about these kind of requirements.

>>>> Could we instead add a call to 
>>>> ThreadLocalStorage::set_thread(NULL), or
>>>> maybe even Thread::clear_thread_current(), in java_start?
>>>> static void *java_start(Thread *thread) {
>>>> [...]
>>>>   thread->initialize_thread_current();
>>>> [...]
>>>>   // call one more level start routine
>>>>   thread->run();
>>>>   ////////// Could we call Thread::clear_thread_current(); here?
>>> Not easily. For JavaThreads we've already done "delete this" inside
>>> the run() method, so we'd have to move that into java_start as well,
>>> but we can only do the delete for JavaThreads not for other threads.
>>> And we'd also have to change the VMThread and WatcherThread
>>> termination logic because of the deletes that happen in the
>>> termination thread - the "this" pointer (thread above) may no longer
>>> be valid when we want to call clear_current_thread() - which is why we
>>> can only do the ThreadLocalStorage::set_thread(NULL).
>>> I agree it would be a lot cleaner to have java_start do:
>>>   thread->common_initialization();
>>>   thread->run();
>>>   thread->common_cleanup();
>>>   delete thread;
>>> for all threads, but we'd need a lot of other changes to allow for
>>> that. Otherwise we would need to note that kind of thread before
>>> calling run() then switch on the thread type after run() to decide
>>> what kind of cleanup is necessary and possible. I don't think that
>>> would be better than just doing the "right" cleanup at the end of the
>>> run() methods.
>> I understand that this is a bit messy, and I won't insist that we change
>> this in this RFR, but without looking at this in much detail it sounds
>> weird to delete the thread in run(). Couldn't this be solved by
>> introducing a virtual Thread::post_run() function and do:
>> virtual void Thread::post_run() {
>>   clear_thread_current();
>> }
>> virtual void JavaThread::post_run() {
>>   Thread::post_run();
>>   delete this;
>> }
> But again this can't work for the VMThread or WatcherThread as they 
> are deleted from the termination thread and so thread->post_run() may 
> SEGV.** Plus it is only after the fact that you realize not to put 
> "delete this" in Thread::post_run().

OK, I didn't understand what you meant with "termination thread", but I 
now see the call to VMThread::destroy().

With that said, I find it odd that VMThread::destroy() deletes the VM 
thread. We already handshake between the VMThread and the "termination 
thread", so why isn't that VMThread::post_run() implemented as:

virtual void VMThread::post_run() {
   // signal other threads that VM process is gone
     // Note: we must have the _no_safepoint_check_flag. Mutex::lock() 
     // VM thread to enter any lock at Safepoint as long as its _owner 
is NULL.
     // If that happens after _terminate_lock->wait() has unset _owner
     // but before it actually drops the lock and waits, the 
notification below
     // may get lost and we will have a hang. To avoid this, we need to use
     // Mutex::lock_without_safepoint_check().
     MutexLockerEx ml(_terminate_lock, Mutex::_no_safepoint_check_flag);
     _terminated = true;

   delete this;

And then we wouldn't get a SEGV ...

I couldn't find the destructor for the WatchThread, but it seems easy to 
fix that as well.

I'm probably missing something, but I find it a bit annoying that code 
that should belong to the *Thread:ing system leaks into the 
implementations of *Thread::run().

> ** Arguably the best solution to the "thread termination races with VM 
> termination" problem is to not let the threads terminate. The code as 
> it exists today can still have JavaThreads destroying themselves at 
> the same that the VM is terminating and potentially hit the same 
> errors that require us to not allow the VMThread (and now 
> WatcherThread) to delete themselves.
> Thanks,
> David
>> Thanks,
>> StefanK
>>> Thanks,
>>> David
>>> ------
>>>>   log_info(os, thread)("Thread finished (tid: " UINTX_FORMAT ", 
>>>> pthread
>>>> id: " UINTX_FORMAT ").",
>>>>     os::current_thread_id(), (uintx) pthread_self());
>>>>   return 0;
>>>> }
>>>> And get rid of the explicit calls to
>>>> ThreadLocalStorage::set_thread(NULL) you added?
>>>> Thanks,
>>>> StefanK
>>>> On 04/05/16 01:39, David Holmes wrote:
>>>>> This needs attention from GC and runtime folk please.
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8154715
>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8154715/webrev/
>>>>> tl;dr: ensure ThreadLocalStorage::set_thread(NULL) is always called
>>>>> before a thread terminates.
>>>>> Background:
>>>>> Most system-related threads do not expect to explicitly terminate,
>>>>> except sometimes as part of VM termination. Such threads don't have
>>>>> their destructors called, but should.
>>>>> This omission came to light due to the ThreadLocalStorage changes in
>>>>> JDK-8132510. As part of that change we deleted the following from the
>>>>> termination path of the VMThread:
>>>>>  // Thread destructor usually does this.
>>>>>  ThreadLocalStorage::set_thread(NULL);
>>>>> The clearing of TLS seemed irrelevant to the VMThread as it primarily
>>>>> is used to aid in JNI attach/detach. However Brian Gardner reported:
>>>>> http://mail.openjdk.java.net/pipermail/bsd-port-dev/2016-February/002788.html 
>>>>> a problem on FreeBSD caused by this change and the interaction with
>>>>> the POSIX  pthread TLS destructor use introduced by JDK-8033696.
>>>>> Because the VMThread terminated without clearing TLS, when the
>>>>> TLS-destructor was called it got into a loop which ran four times (as
>>>>> happens on Linux) and then prints a warning to the console (which
>>>>> doesn't happen on Linux).
>>>>> This indicates we need to restore the:
>>>>>  ThreadLocalStorage::set_thread(NULL);
>>>>> but on further consideration it seems to me that this is not confined
>>>>> to the VMThread, and the most appropriate fix would be to always
>>>>> invoke the Thread destructor as a thread terminates.
>>>>> Solution:
>>>>> Further investigation shows that calling the Thread destructor in the
>>>>> thread as it terminates is not possible:
>>>>> - VMThread
>>>>> This is actually destroyed by the thread that terminates the VM, but
>>>>> that can happen after it terminates and so we still hit the TLS
>>>>> problem. The VMThread may be able to destroy itself today but in the
>>>>> past this was not possible (see existing code comment), and in the
>>>>> future it may also not be possible - the problem is that the Thread
>>>>> destructor can interact with other VM subsystems that are 
>>>>> concurrently
>>>>> being torn down by the thread that is terminating the VM. In the past
>>>>> this was the CodeHeap. So rather than introduce something that is
>>>>> fragile we stick with the current scheme but restore the
>>>>> ThreadLocalStorage::set_thread(NULL); - note we can't access 
>>>>> "this" at
>>>>> that time because it may already have been de-allocated.
>>>>> - WatcherThread
>>>>> The WatcherThread is never destroyed today but has the same 
>>>>> problem as
>>>>> the VMThread. We can call the destructor from the VM termination
>>>>> thread (and have implemented that), but not from the WatcherThread
>>>>> itself. So again we just have to restore the
>>>>> ThreadLocalStorage::set_thread(NULL); to fix the potential TLS 
>>>>> problem.
>>>>> - GC Threads
>>>>> There are two cases:
>>>>> a) GC threads that never terminate
>>>>> For these we don't need to do anything: we can't delete the thread as
>>>>> it never terminates and we don't hit the TLS problem because it never
>>>>> terminates. So all we will do here is add some logic to check (in
>>>>> NON_PRODUCT) that we do in fact never terminate.
>>>>> b) GC threads that can terminate
>>>>> Despite the fact the threads can terminate, references to those
>>>>> threads are stored elsewhere (WorkGangs and other places) and are not
>>>>> cleared as part of the termination process. Those references can be
>>>>> touched after the thread has terminated so we can not call the
>>>>> destructor at all. So again all we can do (without some major thread
>>>>> management reworking) is ensure that
>>>>> ThreadLocalStorage::set_thread(NULL); is called before the thread
>>>>> actually terminates
>>>>> Testing: JPRT
>>>>>          RBT - runtime nightly tests
>>>>> Thanks,
>>>>> David

More information about the hotspot-dev mailing list