[aarch64-port-dev ] [10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays

Dmitrij Pochepko dmitrij.pochepko at bell-sw.com
Mon Oct 30 16:43:30 UTC 2017

On 30.10.2017 19:13, Andrew Haley wrote:
> On 30/10/17 15:42, Dmitrij Pochepko wrote:
>> Any additional numbers on other systems are welcome, as well as early
>> feedback on the code.
> I take it that the small comparisons are unaffected.  The small
> comparisons are very common, so they shouldn't be ignored.
> The patch seems unobjectionable, but it's extremely hard to test
> this stuff.
Well, I've actually used small brute force test which generates all 
cases for arrays length from 1 to N(parameter) to test it, because I 
couldn't find better way.

case 0: equal arrays
case 1: arrays different in 1st symbol
case N: arrays different in (N-1)th symbol

And this test passed. However, I don't think such test should be added 
to jtreg testbase, because it takes long time to run, so, I assume 
existing array equals test is enough.

> Why is this change:
> @@ -16154,7 +16154,7 @@
>     ins_pipe(pipe_class_memory);
>   %}
> -instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt,
> +instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt,
>                           iRegI_R0 result, rFlagsReg cr)
>   %{
>     predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL);
> It seems very odd to me.
You're right. It's leftover from previous versions. It can be reverted 
back to iRegI_R4.
> Was a vertor-based implementation considered?
I've tried simd loads(even aligned ones to be sure that alignment is not 
an issue). simd versions were attached into JDK-8187472 as
  - v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop 
  - v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
  - v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).

I've measured it on ThunderX and found while best non-simd version 
handles 1000000 bytes arrays in ~295 microseconds, simd versions had 
numbers about ~355 microseconds.


More information about the hotspot-compiler-dev mailing list