Thread stack size issue related to glibc TLS bug
fweimer at redhat.com
Fri May 24 19:17:07 UTC 2019
* Jiangli Zhou:
> Hi Florian,
> Thanks for the feedback!
> On Fri, May 24, 2019 at 3:13 AM Florian Weimer <fweimer at redhat.com> wrote:
>> * David Holmes:
>> > My thoughts haven't really changed since 2015 - and sadly neither has
>> > there been any change in glibc in that time. Nor, to my recollection,
>> > have there been any other reported issues with this.
>> The issue gets occasionally reported by people who use small stacks with
>> large initial-exec TLS consumers (such as jemalloc). On the glibc side,
>> we aren't entirely sure what to do about this. We have recently tweaked
>> the stack size computation, so that in many cases, threads now receive
>> an additional page. This was necessary to work around a kernel/hardware
>> change where context switches started to push substantially more data on
>> the stack than before, and minimal stack sizes did not work anymore on
>> x86-64 (leading to ntpd crashing during startup, among other things).
>> The main concern is that for workloads with carefully tuned stack sizes,
>> revamping the stack size computation so that TLS is no longer
>> effectively allocated on the stack might result in address space
>> exhaustion. (This should only be a concern on 32-bit architectures.)
> Could it to be addressed for 64-bit (first) at foreseeable future?
Yes, I'd happily review a patch if we had one.
>> Even if we changed this today (or had changed it in 2015), it would take
>> a long time for the change to end up with end users, so it's unclear how
>> much help it would be.
>> Maybe OpenJDK can add a property specifying a stack size reserve, and
>> htis number is added to all stack size requests? This will at least
>> allow users to work around the issue locally.
> One issue is that user may not know the property should be used and
> what would be the proper reserved size when run into the stack size
> issue related to TLS. The stack size issue could be hard for average
> users to diagnose.
You could print the requested stack size, stack size reserve, and actual
stack size in the StackOverflowError message. The fact that there is a
reserve (which would be zero by default) could serve as a hint and
quickly lead to the discovery of the tunable. At least I hope that.
More information about the core-libs-dev