RFR(S): JDK-8137035 tests got EXCEPTION_STACK_OVERFLOW on Windows 64 bit
frederic.parain at oracle.com
Fri Aug 26 21:53:45 UTC 2016
On 08/26/2016 05:06 PM, Gerard Ziemski wrote:
>> The case of stack overflow in JVM code is more problematic. The JVM tries to avoid the case of stack overflow in VM code with the Shadow Pages mechanism. Whenever a Java method is invoked, the JVM tries to ensure that there's enough free stack space to execute the Java method and *any call to the JVM code (or JDK native code) that could occur during the execution of this method*. This check is performed by banging (touching) n pages ahead on the execution stack, and n is set to StackShadowPages. If the Yellow Zone is hit during the stack banging, a StackOverflowError is thrown before the execution of the first bytecode of the Java method. But this mechanism assumes that StackShadowPages pages is big enough to cover *any call to the JVM*. If this assumption is wrong, so
>> bad things happen.
>> I ran experiments with tests for which stack overflow related crashes were reported. I ran them with a JVM where the StackShadowPages value was decreased by only 1 compared the usual default value. It was very easy to reproduce stack overflow crashes. By instrumenting the JVM, it appeared that some threads hit the Yellow Zone while having thread state _thread_in_vm. Which means that in many cases, the margin between the stack space provided by StackShadowPages and the real stack usage while executing VM code is less than one page. And because knowing the biggest stack requirement to execute any JVM code is an undecidable problem,
> Is it really an undecidable problem? Why is that exactly?
How would you compute the max stack size for any call to the VM?
Just the matrix of all VM options that could impact the stack usage
is huge: several GC, several JIT compilers, JVMTI hooks, JFR.
The work to be performed by the JVM can also be dependent on the
application (class hierarchy, application code itself which can
be optimized (and deoptimized) in many different ways according to
compilation policies and application behavior).
This problem is not specific to the JVM. Linux has a similar issue
with its kernel stacks: they have a fixed size, but there's no way
to ensure that the size is sufficient to execute any system call or
perform any OS operation.
>> Proposed fixes for this issue:
>> - increase StackShadowPages for the Windows platform
> Why are we so stingy with the size of the default shadow pages on Windows? Even with your fix, which increases it by one, it’s only 7, compared to 20 on other platforms.
> Why can’t we have 20 pages of default shadow pages on Windows? Wouldn’t that significantly decrease the chance of hitting the yellow pages, if we can’t guarantee that all calls to VM fit?
Historically, StackshadowPages was approximatively the same for all
platforms. But one day, the JDK team has rewritten the native part of
networking APIs for Unix platforms using stack allocated buffers instead
of malloc'ed buffers. This change caused crashes due to stack overflows
(either the native code hits the Yellow Zone, or it could even "jump"
over the Yellow/Red Zone). So the StackShadowPages default value has
been significantly increased on Unix platforms to provide stack
overflow protection to the JDK networking code. The implementation
of these APIs on Windows doesn't use stack allocated buffers, so the
StackShadowPages default value has not been increased for this platform.
Note that increasing the StackShadowPages has a cost: a cost in memory
because more stack space is reserved for VM code, and a cost in CPU
because StackShadowPages determines the number of pages to bang
before executing a Java method.
>> - add assertion is signal handler to detect thread hitting the Yellow Zone while executing JVM code (to detect undersized StackShadowPages during our testing)
>> - ensure Yellow Pages are activated when transitioning from _thread_in_vm to _thread_in_java
More information about the hotspot-runtime-dev