A hotspot patch for stack profiling (frame pointer)

Erik Helin erik.helin at oracle.com
Tue Dec 9 09:51:20 UTC 2014

On 2014-12-08 21:31, Brendan Gregg wrote:
> G'Day Staffan,
> On Mon, Dec 8, 2014 at 11:17 AM, Staffan Larsen
> <staffan.larsen at oracle.com> wrote:
>>> On 8 dec 2014, at 16:05, Maynard Johnson <maynardj at us.ibm.com> wrote:
> [...]
>>> And if the VM creates a /tmp/perf-<PID>.map file to save information about
>>> JITed methods, the perf's post-profiling tool will find it and use it to
>>> correlate sampled addresses it collected from the VM's executable anonymous
>>> memory mappings to the method names.
>> Is there a way in this .map file to express that different JITed methods are located at the same address at different times? This typically happens a lot when classes and their JITed methods are being unloaded from the VM. That space will be reused by a different method. I’m guessing this would confuse perf.
> In the .map file, no, at least not currently.
> However, consider the following perf sampled stack trace (this is from
> my patched OpenJDK 8, with frame pointers):
> # perf record -F 99 -a -g -- sleep 10
> # perf script
> [...]
> java 10532 [008] 3444046.716431: cpu-clock:
>              7fe919301c30  (/tmp/perf-10490.map)
>              7fe91934d50c  (/tmp/perf-10490.map)
>              7fe9193a43d0  (/tmp/perf-10490.map)
>              7fe9194ffcf0  (/tmp/perf-10490.map)
>              7fe9195026d0  (/tmp/perf-10490.map)
>              7fe9194ffc4c  (/tmp/perf-10490.map)
>              7fe91b6c440c  (/tmp/perf-10490.map)
>              7fe91afa9c00  (/tmp/perf-10490.map)
>              7fe91ab739f4  (/tmp/perf-10490.map)
>              7fe91df23630  (/tmp/perf-10490.map)
>              7fe91ab739f4  (/tmp/perf-10490.map)
>              7fe91acc7ea8  (/tmp/perf-10490.map)
>              7fe91c4fa014  (/tmp/perf-10490.map)
>              7fe9190072e0  (/tmp/perf-10490.map)
>              7fe9190072e0  (/tmp/perf-10490.map)
>              7fe919007325  (/tmp/perf-10490.map)
>              7fe9190004e7  (/tmp/perf-10490.map)
>              7fe92f70670e JavaCalls::call_helper(JavaValue*,
> methodHandle*, JavaCallArguments*, Thread*)
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/lib/am
>              7fe92f707a3f JavaCalls::call_virtual(JavaValue*,
> KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
> (/mnt/openjdk8/build/linux-x86_64-normal-server-r
>              7fe92f707edf JavaCalls::call_virtual(JavaValue*, Handle,
> KlassHandle, Symbol*, Symbol*, Thread*)
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/l
>              7fe92f741668 thread_entry(JavaThread*, Thread*)
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so)
>              7fe92fa555c8 JavaThread::thread_main_inner()
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so)
>              7fe92fa5581c JavaThread::run()
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so)
>              7fe92f916ba2 java_start(Thread*)
> (/mnt/openjdk8/build/linux-x86_64-normal-server-release/jdk/lib/amd64/server/libjvm.so)
>              7fe92ff95e9a start_thread (/lib/x86_64-linux-gnu/libpthread-2.15.so)
> perf script emits every sampled stack trace, which I normally do for
> flame graph generation. If I had Johannes's perf-map-agent loaded,
> then those hex addresses become Java method symbols.
> Note the timestamp on the cpu-clock line.
> So... It should be easy to change perf-map-agent to write a
> timestamped map file, to a different location (so perf doesn't find
> it). Use the same timestamp type as perf. Then, a little Perl wrapper
> can take "perf script" output and translate the addresses based on the
> timestamped map file. (perf itself can be enhanced to do this,
> although the time frame for a perf change propagating to Linux users
> may be years, so the Perl wrapper could be used in the meantime.)
> I haven't needed to do this yet. map file churn in production has been
> small (depends on workload). The real pain for us is switching apps
> from Oracle JDK to my patched OpenJDK, so we get the frame pointer.

Another approach is outlined in the paper "Fast Java profiling with 
scheduling-aware stack fragment sampling and asynchronous analysis" [0]: 
use the perf system call to get 16kB of stack, then translate that 
asynchronously using JVMTI and then finally save the "decoded" stack 
trace to disk. This technique has the advantage of reusing all the 
knowledge already in the JVM about the stack and frame layout while 
still use perf for sampling.

The drawback of this approach is that you would not be using the perf 
tool to set the probes, since the perf kernel utility is now started 
with the perf system call from a JVMTI agent. You would also have to 
translate an eventual native part and the kernel part of the stack trace 
yourself (probably by using some DWARF library).


[0]: http://dl.acm.org/citation.cfm?doid=2647508.2647509

> Brendan

More information about the hotspot-compiler-dev mailing list