RFC: more robust handling of terminated but still attached threads
fweimer at redhat.com
Tue Jul 3 12:19:57 UTC 2018
On 07/03/2018 02:09 PM, David Holmes wrote:
>> The use case isn't entirely clear to me. If you are sufficiently
>> unlucky, the memory behind a pthread_t value is simply gone after
>> thread exit (and potentially TCB/thread stack reclamation in the
>> thread library). On glibc, this includes the internal TID, which is
>> required for pthread_kill (thr, 0) actually sending the signal.
> IIUC pthread_kill(thr,0) never sends any signal, but may lookup the id
> to see if it is valid. I understand there's no guarantee and that there
> is an inherent race regardless.
It still makes a system call to send the pseudo-signal 0. This is what
I meant. It can bail out earlier in case of terminated threads which
have not yet been joined, though.
>> I'm not familiar with the Hotspot run-time and why it needs to do
>> this. Can you deregister the thread from a thread directory once it
>> exits (using one of the TLS variants with a destructor)? Or is the
>> concern there that the destructor would not run late enough?
> The issue is native process threads that attach to the VM through JNI
> but then don't detach themselves before terminating. While it may be
> possible to create such a mechanism as you describe it goes way beyond
> what I'm trying to do here and violates a basic principle that we try to
> interfere as little as possible with threads that attach to the VM
> directly (rather than being created by the VM). There was also a rather
> complex bug involving native threads that themselves provided such a TLS
> destructor (to detach themselves) and the VMs own (fairly recent) use of
> All I'm looking at is some basic robustness if the VM encounters such a
> thread (for which all the VM data structures remain intact - and
> effectively leak) so that we don't assert or crash when we do invoke a
> pthread function (pthread_getcpuclockid is the one in question in the
> bug report).
You could capture the TID and the task creation time from /proc when the
thread is attached, and try to recover the information you need from
/proc afterwards (possibly with a comparison to the startup time).
You probably cannot ensure that the thread will not suddenly cease to
exist, so none of the pthread_* functions cannot be called. The only
in-process way I can image which ensures that the thread stays around is
to send it a signal with an unblocked handler which you control, and
which can then prevent the thread from exiting indefinitely. But that
is a very heavy-handed approach.
Out-of-process, you could use ptrace to freeze threads.
More information about the hotspot-runtime-dev