RFR(s): 8212173: Thread._stack_base/_stack_size initialized too late for new threads
david.holmes at oracle.com
Tue Oct 23 07:05:32 UTC 2018
On 23/10/2018 4:42 PM, Thomas Stüfe wrote:
> Thank you David!
> This is unfortunately unsolved still for Windows/Solaris/AIX, so one
> should not iterate the thread list and query the stack size.
It's somewhat surprising we don't actually have more problems because of
> I wonder whether we should query the OSThread::state and only return
> threads whose state is >= INITIALIZED. But that is for a different
INITIALIZED and RUNNABLE are set by the thread doing the "creation" of
the new thread. We'd need a new state to indicate "self-initialized" and
adjust the use of RUNNABLE. In short we'd need a new state machine for
the thread lifecycle. :)
There's also the issue of the crash reporter being more aware that the
current thread may not yet have self-initialized. Again another patch.
> Thanks, Thomas
> On Tue, Oct 23, 2018 at 4:07 AM David Holmes <david.holmes at oracle.com> wrote:
>> Hi Thomas,
>> This looks good to me - thanks. Setting the stack base and size first up
>> is more robust all round and as you note happens before the adding to
>> the threads-list in some cases.
>> Getting rid of the solaris-specific os::initialize_thread is a good
>> clean up!
>> I've run this through our internal tier 1-3 testing again and it all passed.
>> On 23/10/2018 2:56 AM, Thomas Stüfe wrote:
>>> Dear all,
>>> may I have please reviews for the following fix:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8212173
>>> cr: http://cr.openjdk.java.net/~stuefe/webrevs/8212173-thread-stack-and-base-initialized-too-late/webrev.02/webrev/
>>> See also prior mail discussions: 
>>> Wwhen iterating the thread list, threads may be visible which have
>>> stack base/size not yet set. Calling stack_base() will yield NULL,
>>> stack_size() 0 or it will assert. The reason for this is that there
>>> exist a race: Thread::_stack_base and ::_stack_size are initialized
>>> too late, when the Thread has already been added to the threads list.
>>> My solution to this problem is basically to move the initialization of
>>> Thread::_stack_base and ::_stack_size to an earlier point at which the
>>> thread has not yet been added to the threads list.
>>> Unfortunately, this is more difficult that it sounds. Retrieving
>>> thread stack base and size needs to be done from within the thread in
>>> question, and only on Linux and BSD we have the opportunity to execute
>>> code in the new child thread before its Thread object is added to the
>>> thread list. So, my patch solves the problem completely only on Linux
>>> and BSD. On Windows, Aix and Solaris it only minimizes the chance for
>>> it to happen.
>>> For more details, please see bug description and the discussion in .
>>> My changes in detail:
>>> - Thread::record_stack_base_and_size(): I removed all code which needs
>>> the surrounding Thread object being initialized, or which needs
>>> Thread::current() to be set. That makes the function safe to be called
>>> from the very start of the child thread's life (start of
>>> thread_native_entry()), where nothing is initialized yet.
>>> - I moved the call to Thread::record_stack_base_and_size() out of all
>>> <ChildThreadClass>::run() functions into the very start of the
>>> thread_native_entry() on every platform.
>>> - The parts I moved out of Thread::record_stack_base_and_size() are:
>>> NMT thread stack initialization and Logging. I needed to find a new
>>> place for them. I wanted them to run at the end of the child thread
>>> initialization, when the child Thread is already initialized and
>>> Thread::current is initialized too. So I moved them to just before
>>> <ChildThreadCLass>::run() is called.
>>> - To prevent code duplication, I changed:
>>> public virtual Thread::run()
>>> public (nonvirtual) Thread::call_run();
>>> protected virtual Thread::run() = 0;
>>> Thread::call_run() now is the place to put common,
>>> platform-independent and <ChildThreadClass>-independent
>>> initializations. There I put the NMT stuff and logging. After
>>> <ChildThreadClass>::run() returns, I put common cleanup checks.
>>> Other changes:
>>> - We had a function os::initialize_thread() which only existed for the
>>> sole benefit of Solaris, where if gives the platform a stab at
>>> correcting the stack base, size for a certain corner case when we run
>>> on primordial thread. On all other platform-cpu-combinations, this
>>> function was empty. I removed all empty implementations and renamed
>>> the function to the much more fitting, Solaris only
>>> - When <ChildClassThread>::run() returns, the Thread object may have
>>> been deleted by the child class run() function. To avoid errors I set
>>> the thread pointer to zero (where possible) to prevent accessing it to
>>> reference members.
>>> - Since Thread::record_stack_base_and_size() does not register the
>>> thread stack with NMT anymore, I needed to fix those places where the
>>> former is called for attached threads and add that NMT registration
>>> - I made Thread::clear_thread_current() a static function since it
>>> needs no members of Thread (unlike Thread::initialize_thread_current()
>>> which needs the this pointer to hook it into TLS). That makes it
>>> possible to use clear_thread_current() without having a valid Thread
>>> object, which is the case if <ChildClassThread>::run() deleted the
>>> That's about it.
>>> The change ran through jdk-submit tests without errors.
>>> Thank you for reviewing.
>>>  http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-October/030516.html
More information about the hotspot-runtime-dev