Unsafe.{get,put}-X-Unaligned performance

John Rose john.r.rose at oracle.com
Mon Mar 16 06:12:44 UTC 2015

On Mar 12, 2015, at 11:37 AM, Andrew Haley <aph at redhat.com> wrote:
> On 03/12/2015 05:15 PM, Peter Levart wrote:
>> ...or are JIT+CPU smart enough and there would be no difference?
> C2 always orders things based on profile counts, so there is no
> difference.  Your suggestion would be better for interpreted code
> and I guess C1 also, so I agree it is worthwhile.

Profile counts can partially reorganize decision trees,
if they are unambiguous.  The best effect from profiling
is to prune untaken branches completely (leaving a deopt).

The main caveat here is that this breaks down when the
profile is ambiguous, which can happen when multiple
users of a library routine "pollute" the profile with
divergent behaviors.  See (e.g.) slides 17-19 of:

The JVM currently addresses this mainly by combining local
profile data with type inference that crosses inline boundaries.
The present case can perhaps be improved by type inference
or non-local profiling on bitfields, which is partially discussed in:

BTW, I like Peter's suggestion to perform localized merging of
bytes to shorts (etc.) based on exact alignment.  But, I'd rather
see it done further down the pipeline, after vectorization.

— John

