john.r.rose at oracle.com
Mon Mar 16 06:12:44 UTC 2015
On Mar 12, 2015, at 11:37 AM, Andrew Haley <aph at redhat.com> wrote:
> On 03/12/2015 05:15 PM, Peter Levart wrote:
>> ...or are JIT+CPU smart enough and there would be no difference?
> C2 always orders things based on profile counts, so there is no
> difference. Your suggestion would be better for interpreted code
> and I guess C1 also, so I agree it is worthwhile.
Profile counts can partially reorganize decision trees,
if they are unambiguous. The best effect from profiling
is to prune untaken branches completely (leaving a deopt).
The main caveat here is that this breaks down when the
profile is ambiguous, which can happen when multiple
users of a library routine "pollute" the profile with
divergent behaviors. See (e.g.) slides 17-19 of:
The JVM currently addresses this mainly by combining local
profile data with type inference that crosses inline boundaries.
The present case can perhaps be improved by type inference
or non-local profiling on bitfields, which is partially discussed in:
BTW, I like Peter's suggestion to perform localized merging of
bytes to shorts (etc.) based on exact alignment. But, I'd rather
see it done further down the pipeline, after vectorization.
More information about the hotspot-compiler-dev