A hotspot patch for stack profiling (frame pointer)
brendan.d.gregg at gmail.com
Mon Dec 8 22:19:58 UTC 2014
On Mon, Dec 8, 2014 at 9:15 AM, Mikael Gerdin <mikael.gerdin at oracle.com> wrote:
> On 2014-12-08 16:05, Maynard Johnson wrote:
>> And if the VM creates a /tmp/perf-<PID>.map file to save information about
>> JITed methods, the perf's post-profiling tool will find it and use it to
>> correlate sampled addresses it collected from the VM's executable
>> memory mappings to the method names.
> I seem to recall reading about perf having support for DWARF debug info.
> If the VM (or a JVM/TI agent) could create DWARF debug symbols, could that
> be used to convey information about inlined functions and stack unwinding
> without frame pointers?
> I realize that emitting DWARF debug symbols for generated code is not a
> trivial undertaking but since perf is running sampling in the kernel and we
> can't disable inlining that seems to be one of the few ways we can get
> complete stack traces.
It's a good idea, but I'm not sure the DWARF unwind approach is
suitable for dynamic JIT. I'm usually sampling at 99 Hertz. With
inlined symbols, just the perf.map file can become 10s of Mbytes, and
I assume the DWARF info would be similar. So the file would need to be
in a consistent state so that perf can begin reading it anytime, and
do stack walking based on what it reads, while at the same time
symbols may be compiled anytime and the map file would need to change.
With the frame pointer approach, perf always knows how to walk the
stack, at any time. If symbols move during the profile, it breaks
translation but not walking. And there's different ways to deal with
the translation issue (collect before and after maps and note
differences, or do timestamped maps).
I assume the reliable option is having kernel support for Java
unwinding (like the Solaris approach mentioned previously). Frame
pointer support can be an option for situations when the kernel
support isn't available, while noting its caveats.
> There would be several other advantages to having DWARF symbols for
> generated code, GDB can use them when debugging the JVM for example.
> An alternate approach could be to extend the information in perf-<PID>.map
> to have more detailed PC ranges with information about which functions are
> inlined. A lot of that information is available in the VM but not
> necessarily exposed via the tool APIs
Johannes has done some of this with the perf-map-agent "unfold" option
(https://github.com/jrudolph/perf-map-agent), which includes inlined
information. I've tried adding an extra filter step to resuscitate
frames that were inlined, which sort-of worked (needs more work).
However, having inlined stacks hasn't been that much of a problem.
I've shown my flame graphs to developers, noting that inlined frames
can't be seen, and so far they can still follow what's going on (the
use case here is performance profiling, to figure out where the bulk
of CPU time is spent). jstack(1) output can be used for clues, to see
how the inlined code maps to the full stacks. And, there's JVM
tunables that can be used to reduce inlining.
More information about the hotspot-compiler-dev