A hotspot patch for stack profiling (frame pointer)

Brendan Gregg brendan.d.gregg at gmail.com
Fri Feb 13 22:26:58 UTC 2015

Thanks Bertrand and John for explaining the invokedynamic issue, and
Vladimir for filing the bug.

I'll reply here (I don't have a JBS account; I would like one!).

The profilers I'm using (Linux perf, and Solaris DTrace) can already handle
a broken RBP, and we see this all the time when profiling OracleJDK today
(eg, as a flame graph:

Including an option (eg, -XX:+NoOmitFramePointer, or
-XX:+ReduceOmitFramePointer, or -XX:+MoreFramePointer) which improved RBP
profiling (like my patch) would have great value for us. I'm fine with a
profiler not working 100% of the time, provided we understand that there is
an error margin and why (Bertrand and John's descriptions) for when
interpreting the profiles. Any of these options could also be improved as
follow-on changes, if and when needed.

I haven't had a chance yet to prototype more (eg, option processing).
There's also work happening in Linux (two projects on lkml this week, one
by Stephane Eranian and another by Carl Love) for improving Java JIT symbol
support in perf_events. I think there will be more demand for system stack
walking, as perf gets more symbol translation options.


On Thu, Jan 15, 2015 at 9:50 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> Thank you, Bertrand and John
> I added this conversation to the bug report.
> Thanks,
> Vladimir
> On 1/15/15 3:13 AM, Bertrand Delsart wrote:
>> On 14/01/2015 20:12, John Rose wrote:
>>> On Jan 14, 2015, at 6:42 AM, Bertrand Delsart
>>> <bertrand.delsart at oracle.com <mailto:bertrand.delsart at oracle.com>>
>>> wrote:
>>>> I would not prevent the JITs from using RBP as long as the changeset
>>>> is not sufficient to guarantee the profiling will work... and IMHO
>>>> solving the JSR292 issue will be much more intrusive (impacting
>>>> HotSpot stack walking code).
>>> Here are some thoughts on that.
>>> SPARC uses L7 (L7_mh_SP_save) for the same purpose of method handle
>>> support as x86 uses RBP (rbp_mh_SP_save).  So there's not a hard
>>> requirement for x86 to take over RBP.
>>> (Deep background:  This purpose, in method handle support, is to allow
>>> an adapter to make changes to the caller's SP.  The adapter is the
>>> initial callee from the caller, but may change argument shape, and
>>> tail-calls the ultimate callee.  Because it is a tail-call, the original
>>> caller must have a spot where his original SP can be preserved.  The
>>> preservation works because the original caller knows he is calling a
>>> MH.invoke method, which requires the extra argument preservation.  The
>>> repertoire of argument shape changes is quite small, actually; it is not
>>> a very general mechanism since the LF machinery was put in.  Perhaps the
>>> whole thing could be removed somehow, by finding alternative techniques
>>> for the few remaining changes.  OTOH, this SP-restoring mechanism may be
>>> helpful in doing more a general tail-call mechanism, and perhaps in
>>> managing int/comp mode changes more cleanly, so I'd like us to keep it.
>>>   And document it better.)
>>> Any register or stack slot will do for this purpose, as long as (i) its
>>> value can be recovered after the MH.invoke call returns to the caller,
>>> and (ii) its value can be dug up somehow during stack walking.  There
>>> are only a couple of places where stack walking code needs to sample the
>>> value, so they should be adjustable.
>>> Both x86 and SPARC use registers which are callee-save (or "non-volatile
>>> across calls") which satisfy properties (i) and (ii).  A standard stack
>>> slot (addressed based on caller's RBP) would probably also satisfy those
>>> properties.
>>> A variably-positioned stack slot would also work, which would require
>>> registering the position in each CodeBlob.  That's unpleasant extra
>>> detail, but it would align somewhat with the current logic which allows
>>> each CodeBlob (nmethod, actually) to advertise which call sites need the
>>> special processing (see the function is_method_handle_return(
>>> caller_pc)).
>>> I recommend reserving a dead word of space in every stack frame that
>>> makes MH.invoke calls, at a fixed position relative to that frame's RBP.
>>> — John
>> I perfectly agree that it is doable (and with your proposed approach).
>> I just wanted to be sure people were aware that the RFE is more complex
>> than what the current changeset may suggest. We are not just taking
>> about reviewing and integrating a complete changeset contributed by the
>> community. There is more work needed, either by the community or by
>> Oracle. This will require changes at least in C1 and C2 call sequences,
>> in the stack walking, in the creation and sizing of compiled frames...
>> Regards,
>> Bertrand.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150213/1ac2185f/attachment-0001.html>

More information about the hotspot-compiler-dev mailing list