Vectorized array mismatch updates

Paul Sandoz paul.sandoz at
Thu Dec 17 15:07:22 UTC 2015


The vectorized array mismatch implementation is now fully wired up to Arrays.equals/compare/mismatch in hs-comp and the intrinsic kicks in on x86 for C2.

There are a bunch of follow up tasks that need to be done (where appropriate i will log issues):

1) wiring up the vectorizedMismatch intrinsic stub in C1 on x86;

2) implementing the vectorizedMismatch intrinsic on other platforms, such Sparc and ARM (volunteers? the work is likely similar to that for compact string equality/comparison); and

3) from performance data cleaning up edge cases to reduce or ensure no regressions.

With regards to 3) i have uploaded a JMH benchmark project and raw results for:

- two x86 platforms supporting UseAVX=1 (AVX_1) and UseAVX=2 (AVX_2) respectively (thus AVX_1 and AVX_2 results are not directly comparable)

- C2 (-XX:-UseVectorizedMismatchIntrinsic as “Unsafe", and -XX:+UseVectorizedMismatchIntrinsic as “Vectorized")

- C1 (as “Unsafe", implicitly -XX:-UseVectorizedMismatchIntrinsic since there is no intrinsic yet for C1)

- comparing byte[] and long[]

- small (1..16) and large (2^2..12) array lengths where the content of two arrays are the same, or the last element differs (lastNEQ=false/true). <> <> <>

Observations so far:
(Note for byte[] the vectorizedMismatch does not kick in for an array length < 8).

- byte[], AVX_1, C2
  - No regressions for small arrays, good improvements for large arrays
  - For large arrays the Vectorized performance is marginally better than the Unsafe performance.
    I expect the gap to close once Roland’s fix for JDK-8145322 is pushed (which creates more
    efficient address computation for unrolled Unsafe access loops)

- long[], AVX_1, C2
  - For small arrays there are some regressions both for Vectorized and Unsafe
  - For large arrays there are some regressions both for Vectorized and Unsafe.
    For Unsafe this is due to JDK-8145322.
    For Vectorized there is some variance that might be due to unlucky alignment of quadwords.
  - Further investigation is required: e.g. have a threshold when vectorizedMismatch kicks in
    or we somehow disable Unsafe and/or Vectorized for UseAVX=1, if we can surface constants of
    vectorization/register widths etc. in a platform independent manner.

- byte[], AVX_2, C2
  - For small arrays with Unsafe a small regression is observed at lengths of 11 and 15 when the contents of the arrays are equal.
    This seems like a blip, but might be due to some odd code generation.
  - For small arrays with Vectorized there is no regression.
  - For large arrays performance is good, with Vectorized ~ 2x Unsafe once the length gets large enough (256/512 or larger)
    This translates into an ~10x improvement compared to an ordinary loop.

- long[], AVX_2, C2
  - For small arrays there are some regressions, like for AVX_1
  - For large arrays AVX_2 starts to show a 1.5x improvement.
    Again some variance is observable, perhaps due to unlucky alignment.

- byte[]. AVX_1/2, C1
  (Note only Unsafe results are available)
  - For small arrays there are small regressions for < 8 probably due to the length check and branch to
    the ordinary loop. Not sure if there is much that can be done about.
  - For large arrays the performance boost is good and can be much better if made intrinsic, e.g. ~5x to 8x

- long[]. AVX_1/2, C1
  (Note only Unsafe results are available)
  - For small and large arrays there are noticeable regressions. A C1 intrinsic should improve things.


More information about the hotspot-dev mailing list