RFR(M): 8244383: jhsdb/HeapDumpTestWithActiveProcess.java fails with "AssertionFailure: illegal bci"

Chris Plummer chris.plummer at oracle.com
Tue Jun 23 23:28:09 UTC 2020

On 6/23/20 2:44 PM, Daniel D. Daugherty wrote:
> On 6/18/20 8:54 PM, Chris Plummer wrote:
>> [I've added runtime-dev to this SA review since understanding 
>> interpreter invokes (code generated by 
>> TemplateInterpreterGenerator::generate_normal_entry()) and stack 
>> walking is probably more important than understanding SA.]
>> Hello,
>> Please help review the following:
>> https://bugs.openjdk.java.net/browse/JDK-8244383
>> http://cr.openjdk.java.net/~cjplummer/8244383/webrev.00/index.html
Thanks for helping!
> Sorry for the delay in reviewing this one. I've come back to it a couple
> of times because code like this is very hard to review.
> General comment:
>     This fix reminds of the crazy things that AsyncGetCallTrace has to
>     do in order to gather call trace data. I'm guessing that SA is
>     attaching to the VM in an asynchronous manner and that's why it
>     can observe things like partially constructed frames. If that's a
>     correct guess, then how is SA stopping/suspending the threads?
>     I'm just curious here.
On linux SA uses ptrace. I'm not familiar with the details of how it 
works. I'm not sure where ptrace allows suspends to happen, but 
certainly it has no knowledge of JVM safepoints or other synchronization 
that the JVM does. So from the JVM and SA point of view the suspend can 
happen at any arbitrary JVM instruction.

 From what I can gather, PTRACE_ATTACH suspends the entire process, so 
that means all threads are suspended once you attach. However, 
PTRACE_GETREGS can be called on individual threads (LWPs), but I don't 
see any indication in the SA code that you need to attach to each LWP first.
>     Or this might be a case where SA is examining a core file in
>     which case the various threads stacks are not necessarily at
>     good/safepoint-safe pause points.
For this bug and test it's a live process, but I think the bug being 
addressed here can happen just as well with a core file. Unfortunately 
we have very little core file testing support. I'm actually in the 
middle of addressing that right now.
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java 
>     No comments.
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java 
>     No comments.
> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java 
>     L104:     // two locations, then we canot determine the frame.
>         typo: s/canot/cannot/
>     L127:     // it's validity will help us determine the state of the 
> new frame push.
>         typo: s/it's/its/
>     L148:         System.out.println("CurrentFrameGuess: frame pushed 
> but not initaliazed.");
>         typo: s/initaliazed/initialized/
>     L220:               System.out.println("CurrentFrameGuess: 
> choosing interpreter frame: sp = " +
>     L221:                                  spFound + ", fpFound = " + 
> fp + ", pcFound = " + pc);
>         This debug output doesn't make sense to me:
>             "sp = " label and 'spFound' value
>             "fpFound = " label and 'fp' value
>             "pcFound = " label and 'pc' value
>         but I may not have enough context...
 From the point of view of the person reading the output, they want to 
know the values for sp, fp, and pc. But within the code these values are 
stored in the "found" variables.
> With code like this, it's really hard to figure out if you've covered
> all the cases unless you've been in the observer seat yourself and
> even then your test runs may not hit all the possible cases. All you
> can really do is start with a set of adaptive changes, run with those
> for a while and tweak them as you gather more observations.
Yes, and I know there is still a very tiny gap or two in coverage that 
are maybe one or two instructions long, but they aren't worth dealing 
with. This bug was already very rare, and with the fixes I've done I 
don't see any issues now. SA is a debugger, so perfection in this regard 
is not expected.
> Chris, nice job with this bit of insanity!
Thanks! I mostly stuck with this one to help with my SA expertise. 
Otherwise it wouldn't have been worth the time.

> Thumbs up!
> Dan
>> The crux of the bug is when doing stack walking the topmost frame is 
>> in an inconsistent state because we are in the middle of pushing a 
>> new interpreter frame. Basically we are executing code generated by 
>> TemplateInterpreterGenerator::generate_normal_entry(). Since the PC 
>> register is in this code, SA assumes the topmost frame is an 
>> interpreter frame.
>> The first issue with this interpreter frame assumption is if we 
>> haven't actually pushed the frame yet, then the current frame is the 
>> caller's frame, and could be compiled. But since SA thinks it's 
>> interpreted, later on it tries to convert the frame->bcp to a BCI, 
>> but frame->bcp is only valid for interpreter frames. Thus the 
>> "illegal BCI" failures. If the previous frame happened to be 
>> interpreted, then the existing SA code works fine.
>> The other state of frame pushing that was problematic was when the 
>> new frame had been pushed, but frame->method and frame->bcp were not 
>> setup yet. This also would lead to "illegal BCI" later on because 
>> garbage would be stored in these locations.
>> Fixing the above problems requires trying to determine the state of 
>> the frame push through a series of checks, and then adapting what is 
>> considered to be the current frame based on the outcome of the 
>> checks. The first things checked is that frame->method is valid (we 
>> can successfully instantiate a wrapper for the Method* without 
>> failure) and that frame->bcp is within the method. If both these pass 
>> then we can use the frame as-is.
>> If the above checks fail, then we try to determine whether the issue 
>> is that the frame is not yet pushed and the current frame is actually 
>> compiled, or the frame has been pushed but not yet initialized. This 
>> is done by first getting the return address from the stack or RAX 
>> (it's location depends on how far along we are in the entry code) and 
>> comparing this to what is stored in frame->return_addr. If they are 
>> the same, then we have pushed the frame but not yet initialized it. 
>> In this case we use the previous frame (senderSP() and senderFP()) as 
>> the current frame since the current frame is not yet initialized. If 
>> the return address check fails, then we assume the new frame is not 
>> yet pushed, and and treat the current frame as compiled, even though 
>> PC points into the interpreter (we replace PC with RAX in this case).
>> Comments in the code pretty well explain all the above, so it is 
>> probably easier to follow the logic in the code along with the 
>> comments rather than apply my above description to the code.
>> I should add that it's very rare that we ever get into this special 
>> error handling code. This bug was very hard to reproduce initially. I 
>> was only able to make progress with reproducing and debugging by 
>> inserting delay loops in various spots in the code generated by 
>> TemplateInterpreterGenerator::generate_normal_entry(). By doing this 
>> I was able to reproduce the issue quite easily and hit all the logic 
>> in the new code I've added.
>> The fix is basically entirely contained within 
>> AMD64CurrentFrameGuess.java. The rest of the changes are minor:
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/amd64/AMD64CurrentFrameGuess.java 
>> -Main fix for CR
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java 
>> -Added getInterpreterFrameBCP(), which is now needed by 
>> AMD64CurrentFrameGuess.java
>> -I also simplified some code by using the existing 
>> getInterpreterFrameMethod()
>>  rather than replicating inline what it does.
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/bsd/amd64/BsdAMD64CFrame.java 
>> -I noticed the windows version of this code had some extra checks 
>> that were missing
>>  from the bsd version. I then looked at the linux version, but it had 
>> been heavily modified
>>  a short while back to leverage DWARF info to determine frames. So I 
>> looked at the previous
>>  rev and it too had these extra checks. I decided to add them to the 
>> BSD port. I'm not sure
>>  if it helps at all, but it certainly doesn't seem to do any harm.
>> thanks,
>> Chris

More information about the hotspot-runtime-dev mailing list