A hotspot patch for stack profiling (frame pointer)

Brendan Gregg brendan.d.gregg at gmail.com
Fri Dec 5 23:09:46 UTC 2014

G'Day Volker,

On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis
<volker.simonis at gmail.com> wrote:
> Hi Brendan,
> I'm still not understanding who is taking the actual stack traces (let
> alone the symbols) in your examples. Is this done by 'perf' itself
> based only on the frame pointer?

perf is walking the frame pointers.

A JVMTI agent, perf-map-agent, is providing a map file for symbol
translation under /tmp/perf-PID.map. Linux perf already hunts for such
a file when doing symbol translation.

> As I wrote before, this is pretty hard to get right for a JVM, but
> there are good approximations. Have you looked at the 'jstack' tool
> which is part of the JDK? If you run it on a Java process, it will
> give you exact stack traces with full inlining information. However
> this only works at safepoints so it is probably not suitable for
> profiling with performance counters.

Right, jstack works, and I get full correct stacks. I do really want
to take stacks at any moment: not just CPU samples, but when tracing
kernel TCP events, or PMC cache miss profiling, etc. perf can already
do many advanced tracing and profiling activities. I just needed the
Java stacks for context.

> But you can also use 'jstack -F
> -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most
> of the time even with inlined Java frames. This is probably the best
> you can get when interrupting a running JVM at an arbitrary point in
> time. As you mentioned in one of your blogs, the VM can be in the
> C-Library or even in the kernel at that time which don't preserve the
> frame pointer either. So it will be already hard to even walk up to
> the first Java frame.

Well, the JVMs I'm looking at are already built with
-fno-omit-frame-pointer (which is good). I edited hotspot to preserve
it as well.

Here's before I changed hotspot:


Yes, most stacks are clearly broken.

After changing hotspot:


It's looking pretty good. If you look carefully on the far left and
right, there are 0.8% stacks in read() and write() directly from java,
which may well be broken (unless a java thread is calling these
directly; there could also be some gcc inlining going on). Even if
they are broken, I can see 98% of my profile. Plus, I'd be interested
to know what exactly is reusing the frame pointer, so we could fix
that too.

The Java stacks themselves are also about a third as deep as they
should be, due to inlining.

> But nevertheless, if the output of 'jstack -F -m' is "good enough" for
> your purpose, you can implement something similar in 'perf' or a
> helper library of 'perf' and be happy (I don't actually know how perf
> takes stack traces but I suppose there may some kind of callback
> mechanism for walking unknown frames). This is actually not so hard.
> I've recently implemented a "print_native_stack()" function within
> hotspot itself (you can call it for example from gdb during debugging
> - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4).
> Maye you could call this functions directly from 'perf' if perf
> attaches with ptrace to the process (I assume it does or how else
> could it walk the stack)?

An OS-cooperative stack walker would be great, and I think the hotspot
team is already doing this for Oracle Solaris. Thanks for the code
too, this is pretty interesting.

jstack -F -m eats 0.5s of CPU for me, so it would need work to make
this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary
kernel functions or tracepoints and get Java context from them, too.
Eg, TCP functions, memory allocation, disk I/O, etc.

> These were just some random thoughts with the hope that they may be helpful.

Yes, thanks!


More information about the hotspot-compiler-dev mailing list