Problem with HSAIL->interpreter deopt with many variables

Caspole, Eric Eric.Caspole at
Tue Aug 26 18:50:54 UTC 2014

Hi everybody,
Is it normal to have the deoptimization of a compiled frame sitting
right on top of a call_stub frame called from the C++ code? I don't see any comments in
deoptimization or in the stub generators that mention anything about this.

Here is an example of the stack when this problem happens during a
kernel deoptimization:

- Hsail::execute_kernel_void_1d_internal sp=7ff0680

- [JavaCall frames]

- StubRoutines::call_stub sp = 7ff0340

- hsail.test.lambda.MoreThanEightArgsOOBTest.lambda$innerTest$139 sp =
(This is the compiled "trampoline" for HSAIL deoptimization)

- UncommonTrapStub.uncommonTrapHandler sp =7ff01a8

#2  0x00007ffff645d310 in Deoptimization::unpack_frames
(thread=0x7ffff000d800, exec_mode=2) at

#1  0x00007ffff6a99921 in vframeArray::unpack_to_stack
(this=0x7ffff0954d58, unpack_frame=..., exec_mode=2,
caller_actual_parameters=9) at

#0  vframeArrayElement::unpack_on_stack (this=0x7ffff0955450,
caller_actual_parameters=9, callee_parameters=0, callee_locals=0,
caller=0x7ffff7fef780, is_top_frame=true, is_bottom_frame=true,
exec_mode=2) at

top frame $rsp=0x7ffff7fee7e0

_this->_frame size = 10,
sender frame size = 283 (this is theStubRoutines::call_stub frame)

(gdb) p this->_frame
$5 = {
   _sp = 0x7ffff7ff02c8,
   _pc = 0x7fffdc00a7a0 "H\307", <incomplete sequence \360>,
   _cb = 0x7fffdc005390,
   _deopt_state = frame::not_deoptimized,
   static _check_value = {
     <OopClosure> = {
       <Closure> = {
         <StackObj> = {
           <AllocatedObj> = {
             _vptr.AllocatedObj = 0x7ffff72e57b0
           }, <No data fields>},
         members of Closure:
         _abort = false
       }, <No data fields>}, <No data fields>},
   static _check_oop = {
     <OopClosure> = {
       <Closure> = {
         <StackObj> = {
           <AllocatedObj> = {
             _vptr.AllocatedObj = 0x7ffff72e5770
           }, <No data fields>},
         members of Closure:
         _abort = false
       }, <No data fields>}, <No data fields>},
   static _zap_dead = {
     <OopClosure> = {
       <Closure> = {
         <StackObj> = {
           <AllocatedObj> = {
             _vptr.AllocatedObj = 0x7ffff72e5730
           }, <No data fields>},
         members of Closure:
         _abort = false
       }, <No data fields>}, <No data fields>},
   _fp = 0x7ffff7ff0308,
   _unextended_sp = 0x7ffff7ff02c8

365       for(i = 0; i < locals()->size(); i++) {
366         StackValue *value = locals()->at(i);
367         intptr_t* addr  = iframe()->interpreter_frame_local_at(i);
368         switch(value->type()) {
(gdb) p addr
$6 = (intptr_t *) 0x7ffff7ff0380

So here you can see that addr, where locals will be written, is well
above the SP of StubRoutines::call_stub (7ff0340), and it overwrites the
callee-saves saved in the call_stub frame. Depending on how many locals
get restored here in the wrong place, this may or may not cause a crash
after returning all the way back to execute_kernel_void_1d_internal.

I put a better test than before at:

I run it like:  ./ -V  --vmbuild debug  --vm server unittest  -Xms2g -Xmx2g -XX:+TraceGPUInteraction -XX:+PrintGCDetails  -XX:-UseCompressedOops -Dkerneltester.runOkraFirst=true hsail.test.lambda.MoreThanEightArgsOOBTest

Thanks for any advice on this,

-------- Original Message --------
Subject: Problem with HSAIL->interpreter deopt with many variables
Date: Thu, 21 Aug 2014 22:50:11 +0000
From: Caspole, Eric <Eric.Caspole at>
To: graal-dev at <graal-dev at>

I think I found a problem with the HSAIL deoptimization back to
interpreter when there are a lot of locals in the offloaded lambda. From
what I have seen so far it looks like if there are more than about 8
locals, and I am not sure what is the mix of ints and objects, when the
locals get restored into the new interpreter frame in
vframeArrayElement::unpack_on_stack(), it writes into the stack frame of
call_stub() that is used when calling from the hsail C++ code to the x86
trampoline for the method.

I put a test case that shows working/crashing just by switching 2 lines
of code at
Just switch the lines around at line 47 to see it work or crash.
In this test to see the crash you have to take a safepoint and
deoptimize on the compiler safepoint in the loop in the kernel.
Run it with : ./ --vmbuild debug --vm server unittest
-XX:+TraceGPUInteraction  hsail.test.lambda.OtherArgsWithCompSafepointTest

When the problem happens, it over writes the callee saves in call_stub
so it ends up crashing in the hsail C++ code or near there.
I am not sure if this problem has always been there since we have very
few test cases with this many variables.

I am not familiar with how the frames are created on a deopt. Could
someone give me some hints about this? How is the newly created frame
placed relative to the caller frames? How is the size of that frame


Here is one example crash from this test case -
# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x00007fd250000770, pid=835, tid=140541751068416
# JRE version: OpenJDK Runtime Environment (8.0) (build
# Java VM: OpenJDK 64-Bit Server VM
(25.0-b63-internal-graal-0.5-dev-debug mixed mode linux-amd64 )
# Problematic frame:
# v  ~StubRoutines::call_stub
# Core dump written. Default location:
/home/ecaspole/views/graal-deopt-size/graal/core or core.835
# An error report file with more information is saved as:
# /home/ecaspole/views/graal-deopt-size/graal/hs_err_pid835.log
Loaded disassembler from
# If you would like to submit a bug report, please visit:

More information about the graal-dev mailing list