Performance impact of decommissioning arrayStorageProperties to legacy code.

Sergey Kuksenko sergey.kuksenko at oracle.com
Wed Jun 10 04:52:26 UTC 2020


   Update.

   New analysis was done with modified benchmark to cover polymorphic 
array store. Array store was mixed for array of Object, array of 
interface, array of abstract class and array of concrete class.

   Here are performance results for polymorphic array store:

   

                                |baseline(ns)| v-66 (ns) | v-72 (ns) | v-66/baseline | v-72/baseline |  v-72/v-66

G1GC (compressedOops)          :    380     |    445    |    420    |    -17.1%     |   -10.5%      |    5.6%

G1GC (uncompressedOops)        :    300     |    400    |    390    |    -33.3%     |   -30.0%      |    2.5%

ParallelGC (compressedOops)    :    310     |    360    |    350    |    -16.1%     |   -12.9%      |    2.8%

ParallelGC (uncompressedOops)  :    284     |    330    |    300    |    -16.2%     |    -5.6%      |    9.1%

ZGC (uncompressedOops)         :    285     |    314    |    310    |    -10.2%     |    -8.8%      |    1.3%

EpsisonGC (compressedOops)     :    284     |    340    |    320    |    -19.7%     |   -12.7%      |    5.9%

EpsisonGC (uncompressedOops)   :    277     |    294    |    300    |     -6.1%     |    -8.3%      |   -2.0%



   New column added - speedup v-72 over v-66.

   For polymorphic array store the picture is not so bright, but anyway 
Decommission arrayStorageProperties gives performance speedup (except 1 
case).
   In case of polymorphic array store access to Klass is performed 
always, and clearing extra bits from klass ptr has negative effect. By 
the way, which field of Klass has offset 0xE8?

   What is interesting - it's quite large difference between baseline 
and both Valhalla versions in case of G1GC.
   Comparing generated code of baseline and v-72 it was found two 
differences:

   1. Different layout of basic blocks (some jumps are reverted, je -> 
jne).
      But it shouldn't be the source of regression, profiling has shown 
that number of branches and branch-missed the same for baseline and 
Valhalla.

   2. Access to layout helper and checking if it's array of values.

      ...

      mov    0x8(%r10),%r8d

      mov    %edx,%r12d

      sar    $0x1d,%r8d

      cmp    $0xfffffffd,%r8d

      je     0x00007fab202b2d96

      ...


Tobias, What do you think? Does it make sense to play with layout 
helper? Nothing prevents us to make 1 bit tags and test & jump and check 
what we get?


On 6/9/20 8:13 AM, Tobias Hartmann wrote:
> Hi Sergey,
>
> thanks again for the nice report! Comments below.
>
> On 09.06.20 06:43, Sergey Kuksenko wrote:
>>    Note: Unroll and out of hoisting was happened only for ZGC, ParallelGC and EpsilonGC. It was not
>> done for G1 by unknown reason. Maybe this need attention.
> That's unexpected. Is it the same with mainline?
>
>>    Decommission arrayStorageProperties has positive performance effect for aastore operation in any
>> conditions. The really nice fact that aastore completely doesn't have negative performance effects
>> for legacy code in Valhalla.   The fact is klass ptr is loaded for every aastore operation and
>> checked if runtime of the array is Object[] (for this benchmark it's the simplest form of array
>> store check). In v-66 arrayStorageProperties bits clearing should be done.
>>    In v-72 there are no Valhalla checks at all (we already checked if it's Object[] - don't need to
>> do anything else).
> Right. This is because C2 speculates on the array being monomorphic (MomorphicArrayCheck
> optimization) and we can then omit all inline type specific checks. Have you checked with a
> polymorphic array store? In that case you should see flat/null-free checks and these will have an
> impact on performance.
>
> Thanks,
> Tobias


More information about the valhalla-dev mailing list