[aarch64-port-dev ]  RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays
dmitrij.pochepko at bell-sw.com
Mon Oct 30 16:43:30 UTC 2017
On 30.10.2017 19:13, Andrew Haley wrote:
> On 30/10/17 15:42, Dmitrij Pochepko wrote:
>> Any additional numbers on other systems are welcome, as well as early
>> feedback on the code.
> I take it that the small comparisons are unaffected. The small
> comparisons are very common, so they shouldn't be ignored.
> The patch seems unobjectionable, but it's extremely hard to test
> this stuff.
Well, I've actually used small brute force test which generates all
cases for arrays length from 1 to N(parameter) to test it, because I
couldn't find better way.
case 0: equal arrays
case 1: arrays different in 1st symbol
case N: arrays different in (N-1)th symbol
And this test passed. However, I don't think such test should be added
to jtreg testbase, because it takes long time to run, so, I assume
existing array equals test is enough.
> Why is this change:
> @@ -16154,7 +16154,7 @@
> -instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegI_R4 cnt,
> +instruct string_equalsL(iRegP_R1 str1, iRegP_R3 str2, iRegP_R4 cnt,
> iRegI_R0 result, rFlagsReg cr)
> predicate(((StrEqualsNode*)n)->encoding() == StrIntrinsicNode::LL);
> It seems very odd to me.
You're right. It's leftover from previous versions. It can be reverted
back to iRegI_R4.
> Was a vertor-based implementation considered?
I've tried simd loads(even aligned ones to be sure that alignment is not
an issue). simd versions were attached into JDK-8187472 as
- v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop
- v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
- v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).
I've measured it on ThunderX and found while best non-simd version
handles 1000000 bytes arrays in ~295 microseconds, simd versions had
numbers about ~355 microseconds.
More information about the hotspot-compiler-dev