RFR: 8170307: Stack size option -Xss is ignored

Thomas Stüfe thomas.stuefe at gmail.com
Wed Nov 30 08:17:03 UTC 2016

On Wed, Nov 30, 2016 at 8:35 AM, David Holmes <david.holmes at oracle.com>

> On 29/11/2016 10:25 PM, David Holmes wrote:
>> I just realized I overlooked the case where ThreadStackSize=0 and the
>> stack is unlimited. In that case it isn't clear where the guard pages
>> will get inserted - I do know that I don't get a stackoverflow error.
>> This needs further investigation.
> So what happens here is that the massive stack-size causes stack-bottom to
> be higher than stack-top! So we will set a guard-page goodness knows where,
> and we can consume the current stack until such time as we hit an unmapped
> or protected region at which point we are killed.
> I'm not sure what to do here. My gut feel is that in such a case we should
> not attempt to create a guard page in the initial thread. That would
> require using a sentinel value for the stack-size. Though it also presents
> a problem for stack-bottom - which is implicitly zero. It may also give
> false positives in the is_initial_thread() check!
> Thoughts? Suggestions?
Maybe I am overlooking something, but should os::capture_initial_thread()
not call pthread_getattr_np() first to handle the case where the VM was
created on a pthread which is not the primordial thread and may have a
different stack size than what getrlimit returns? And fall back to
getrlimit only if pthread_getattr_np() fails? And then we also should
handle RLIM_INFINITY. For that case, I also think not setting guard pages
would be safest.

We also may just refuse to run in that case, because the workaround for the
user is easy - just set the limit before process start. Note that on AIX,
we currently refuse to run on the primordial thread because it may have
different page sizes than pthreads and it is impossible to get the exact
stack locations.


> David
>> On 29/11/2016 9:59 PM, David Holmes wrote:
>>> Hi Thomas,
>>> On 29/11/2016 8:39 PM, Thomas Stüfe wrote:
>>>> Hi David,
>>>> thanks for the good explanation. Change looks good, I really like the
>>>> comment in capture_initial_stack().
>>>> Question, with -Xss given and being smaller than current thread stack
>>>> size, guard pages may appear in the middle of the invoking thread stack?
>>>> I always thought this is a bit dangerous. If your model is to have the
>>>> VM created from the main thread, which then goes off to do different
>>>> things, and have other threads then attach and run java code, main
>>>> thread later may crash in unrelated native code just because it reached
>>>> the stack depth of the hava threads? Or am I misunderstanding something?
>>> There is no change to the general behaviour other than allowing a
>>> primordial process thread that launches the VM, to now not have an
>>> effective stack limited at 2MB. The current logic will insert guard
>>> pages where ever -Xss states (as long as less than 2MB else 2MB), while
>>> with the fix the guard pages will be inserted above 2MB - as dictated by
>>> -Xss.
>>> David
>>> -----
>>> Thanks, Thomas
>>>> On Fri, Nov 25, 2016 at 11:38 AM, David Holmes <david.holmes at oracle.com
>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>>     Bug: https://bugs.openjdk.java.net/browse/JDK-8170307
>>>>     <https://bugs.openjdk.java.net/browse/JDK-8170307>
>>>>     The bug is not public unfortunately for non-technical reasons - but
>>>>     see my eval below.
>>>>     Background: if you load the JVM from the primordial thread of a
>>>>     process (not done by the java launcher since JDK 6), there is an
>>>>     artificial stack limit imposed on the initial thread (by sticking
>>>>     the guard page at the limit position of the actual stack) of the
>>>>     minimum of the -Xss setting and 2M. So if you set -Xss to > 2M it is
>>>>     ignored for the main thread even if the true stack is, say, 8M. This
>>>>     limitation dates back 10-15 years and is no longer relevant today
>>>>     and should be removed (see below). I've also added additional
>>>>     explanatory notes.
>>>>     webrev: http://cr.openjdk.java.net/~dholmes/8170307/webrev/
>>>>     <http://cr.openjdk.java.net/~dholmes/8170307/webrev/>
>>>>     Testing was manually done by modifying the launcher to not run the
>>>>     VM in a new thread, and checking the resulting stack size used.
>>>>     This change will only affect hosted JVMs launched with a -Xss value
>>>>     > 2M.
>>>>     Thanks,
>>>>     David
>>>>     -----
>>>>     Bug eval:
>>>>     JDK-4441425 limits the stack to 8M as a safeguard against an
>>>>     unlimited value from getrlimit in 1.3.1, but further constrained
>>>>     that to 2M in 1.4.0 due to JDK-4466587.
>>>>     By 1.4.2 we have the basic form of the current problematic code:
>>>>     #ifndef IA64
>>>>       if (rlim.rlim_cur > 2 * K * K) rlim.rlim_cur = 2 * K * K;
>>>>     #else
>>>>       // Problem still exists RH7.2 (IA64 anyway) but 2MB is a little
>>>> small
>>>>       if (rlim.rlim_cur > 4 * K * K) rlim.rlim_cur = 4 * K * K;
>>>>     #endif
>>>>       _initial_thread_stack_size = rlim.rlim_cur & ~(page_size() - 1);
>>>>       if (max_size && _initial_thread_stack_size > max_size) {
>>>>          _initial_thread_stack_size = max_size;
>>>>       }
>>>>     This was added by JDK-4678676 to allow the stack of the main thread
>>>>     to be _reduced_ below the default 2M/4M if the -Xss value was
>>>>     smaller than that.** There was no intent to allow the stack size to
>>>>     follow -Xss arbitrarily due to the operational constraints imposed
>>>>     by the OS/glibc at the time when dealing with the primordial process
>>>>     thread.
>>>>     ** It could not actually change the actual stack size of course, but
>>>>     set the guard pages to limit use to the expected stack size.
>>>>     In JDK 6, under JDK-6316197, the launcher was changed to create the
>>>>     JVM in a new thread, so that it was not limited by the
>>>>     idiosyncracies of the OS or thread library primordial thread
>>>>     handling. However, the stack size limitations remained in place in
>>>>     case the VM was launched from the primordial thread of a user
>>>>     application via the JNI invocation API.
>>>>     I believe it should be safe to remove the 2M limitation now.

More information about the hotspot-dev mailing list