request for review (L): 7121756 Improve C1 inlining policy by using profiling at call sites
roland.westrelin at oracle.com
Thu Dec 15 06:52:37 PST 2011
Implements profile based inlining in C1.
Execution of a method starts interpreted as usual. A method transitions from interpreted to compiled in the usual way as well. When the method is compiled, the compiler identifies a number of call sites that are candidates for profiling and further inlining. At those call sites, the method is compiled so that a per call site counter is incremented and tested for overflow when the call site is used. On first call site resolution, a timestamp is also recorded. The count and timestamp are used to compute a frequency. A frequency higher than a high water mark detects a hot call site. A hot call site triggers a recompilation of the caller method in which the callee is inlined. A frequency higher than a low water mark detects a warm call site. Otherwise the call site is cold. Recompiling with the extra inlining won't bring a performance advantage for a warm or cold call site. But keeping the profiling on at a warm call site is detrimental so it is dropped. At a cold call site profiling can be kept enabled to trigger later recompilation if the call site becomes hot.
To perform profiling, the compiler identifies the candidate call sites and generates a stub similar to the static call stub in the nmethod's stub area. The profile call stub performs the following step:
1- load mdo pointer in register
2- increment counter for call site
3- branch to runtime if counter crosses threshold
4- jump to callee
On call site resolution, for a call to a compiled method, the jump (4- above) is patched with the resolved call site info (to continue to callee's code or transition stub) then the call site is patched to point to the profile call stub. Profiling can be later fully disabled for the call site (if the call site is polymorphic or if the compilation policy finds it's better to not profile the call site anymore) by reresolving the call.
The compiler also uses profile data to inline a frequent virtual method if profile data suggests a single receiver class. State changes of inline caches associated with call sites (performed in the runtime) are used to collect receiver class data. Correctness during execution is enforced with a compiled guard and a deoptimization can be triggered.
More information about the hotspot-compiler-dev