A hotspot patch for stack profiling (frame pointer)

Maynard Johnson maynardj at us.ibm.com
Mon Dec 8 15:05:54 UTC 2014

On 12/05/2014 05:09 PM, Brendan Gregg wrote:
> G'Day Volker,
> On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis
> <volker.simonis at gmail.com> wrote:
>> Hi Brendan,
>> I'm still not understanding who is taking the actual stack traces (let
>> alone the symbols) in your examples. Is this done by 'perf' itself
>> based only on the frame pointer?
> perf is walking the frame pointers.
Volker, to be specific, the perf profiling tool has a user space part and a
kernel space part. The collection of stack traces is done by the kernel.
When a user-specified event (or series of events) occur, the process
being profiled is interrupted and the sampled information (which can
optionally include a full stack trace) is made available to the user space
perf tool to be saved to a file for future post-profiling processing.

During the profiling phase, the perf tool collects information about the
profiled process's memory mappings, which allows for this address-to-symbol.
resolution, It's in the post-profiling phase where the sampled instruction,
along with its associated stack trace, are resolved to the appropriate symbol
(i.e., function/method) in a specific binary file (e.g., library, exectuable).

And if the VM creates a /tmp/perf-<PID>.map file to save information about
JITed methods, the perf's post-profiling tool will find it and use it to
correlate sampled addresses it collected from the VM's executable anonymous
memory mappings to the method names.

> A JVMTI agent, perf-map-agent, is providing a map file for symbol
> translation under /tmp/perf-PID.map. Linux perf already hunts for such
> a file when doing symbol translation.
>> As I wrote before, this is pretty hard to get right for a JVM, but
>> there are good approximations. Have you looked at the 'jstack' tool
>> which is part of the JDK? If you run it on a Java process, it will
>> give you exact stack traces with full inlining information. However
>> this only works at safepoints so it is probably not suitable for
>> profiling with performance counters.
> Right, jstack works, and I get full correct stacks. I do really want
> to take stacks at any moment: not just CPU samples, but when tracing
> kernel TCP events, or PMC cache miss profiling, etc. perf can already
> do many advanced tracing and profiling activities. I just needed the
> Java stacks for context.
>> But you can also use 'jstack -F
>> -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most
>> of the time even with inlined Java frames. This is probably the best
>> you can get when interrupting a running JVM at an arbitrary point in
>> time. As you mentioned in one of your blogs, the VM can be in the
>> C-Library or even in the kernel at that time which don't preserve the
>> frame pointer either. So it will be already hard to even walk up to
>> the first Java frame.
> Well, the JVMs I'm looking at are already built with
> -fno-omit-frame-pointer (which is good). I edited hotspot to preserve
> it as well.
> Here's before I changed hotspot:
> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-nofp.svg
> Yes, most stacks are clearly broken.
> After changing hotspot:
> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg
> It's looking pretty good. If you look carefully on the far left and
> right, there are 0.8% stacks in read() and write() directly from java,
> which may well be broken (unless a java thread is calling these
> directly; there could also be some gcc inlining going on). Even if
> they are broken, I can see 98% of my profile. Plus, I'd be interested
> to know what exactly is reusing the frame pointer, so we could fix
> that too.
> The Java stacks themselves are also about a third as deep as they
> should be, due to inlining.
>> But nevertheless, if the output of 'jstack -F -m' is "good enough" for
>> your purpose, you can implement something similar in 'perf' or a
>> helper library of 'perf' and be happy (I don't actually know how perf
>> takes stack traces but I suppose there may some kind of callback
>> mechanism for walking unknown frames). This is actually not so hard.
>> I've recently implemented a "print_native_stack()" function within
>> hotspot itself (you can call it for example from gdb during debugging
>> - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4).
>> Maye you could call this functions directly from 'perf' if perf
>> attaches with ptrace to the process (I assume it does or how else
>> could it walk the stack)?
> An OS-cooperative stack walker would be great, and I think the hotspot
> team is already doing this for Oracle Solaris. Thanks for the code
> too, this is pretty interesting.
> jstack -F -m eats 0.5s of CPU for me, so it would need work to make
> this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary
> kernel functions or tracepoints and get Java context from them, too.
> Eg, TCP functions, memory allocation, disk I/O, etc.
>> These were just some random thoughts with the hope that they may be helpful.
> Yes, thanks!
> Brendan

More information about the hotspot-compiler-dev mailing list