RFR: 8221647: [lworld] Performance regression due to the fact that check if array is array of values is not hoisted out of the loop.

Sergey Kuksenko sergey.kuksenko at oracle.com
Thu Jun 6 16:03:09 UTC 2019


Here are just a small example/microbenchmarks performance improvement:

openjdk.valhalla.baseline.acmp.IsCmpBranch.equals000    85.01%
openjdk.valhalla.baseline.acmp.IsCmpBranch.equals025    55.22%
openjdk.valhalla.baseline.acmp.IsCmpBranch.equals050    50.52%
openjdk.valhalla.baseline.acmp.IsCmpBranch.equals075    50.64%
openjdk.valhalla.baseline.acmp.IsCmpBranch.equals100    43.98%
openjdk.valhalla.baseline.acmp.IsCmpResult.equals000    86.17%
openjdk.valhalla.baseline.acmp.IsCmpResult.equals025    56.07%
openjdk.valhalla.baseline.acmp.IsCmpResult.equals050    63.91%
openjdk.valhalla.baseline.acmp.IsCmpResult.equals075    67.37%
openjdk.valhalla.baseline.acmp.IsCmpResult.equals100    49.11%
openjdk.valhalla.baseline.acmp.IsNotNullBranch.equals000    86.96%
openjdk.valhalla.baseline.acmp.IsNotNullBranch.equals025    9.41%
openjdk.valhalla.baseline.acmp.IsNotNullBranch.equals050    10.36%
openjdk.valhalla.baseline.acmp.IsNotNullBranch.equals075    12.44%
openjdk.valhalla.baseline.arrays.Copy1.covariance-size:500 54.37%
openjdk.valhalla.baseline.arrays.Copy2.covariance-size:500 58.56
openjdk.valhalla.baseline.arrays.Copy8.covariance-size:500 58.34
openjdk.valhalla.baseline.invoke.Interface.reference_targets1 53.48%
openjdk.valhalla.baseline.invoke.Interface.reference_targets2 45.81%
openjdk.valhalla.baseline.invoke.Interface.reference_targets3 1.86%
openjdk.valhalla.baseline.invoke.ObjectHashCodeExplicit.reference_targets1 
55.28%
openjdk.valhalla.baseline.invoke.ObjectHashCodeExplicit.reference_targets2 
47.65%
openjdk.valhalla.baseline.invoke.ObjectHashCodeExplicit.reference_targets3 
1.12%
openjdk.valhalla.baseline.traversal.Covariance.W0001.walk-shuffle:true 
15.88%
openjdk.valhalla.baseline.traversal.Covariance.W0004.walk-shuffle:true 9.59%
openjdk.valhalla.baseline.traversal.Covariance.W0016.walk-shuffle:true 5.03%
openjdk.valhalla.baseline.traversal.Covariance.W0032.walk-shuffle:true 6.07%
openjdk.valhalla.baseline.traversal.Covariance.W0128.walk-shuffle:true 2.47%
openjdk.valhalla.baseline.traversal.Covariance.W0256.walk-shuffle:true 1.31%

On 5/29/19 1:16 PM, Sergey Kuksenko wrote:
> Quick check have shown that the fix solves the issue. Looks very 
> promising. More details will be available tomorrow.
>
> On 5/29/19 6:51 AM, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8221647/webrev.00/
>>
>> This patch includes:
>>
>> - an implementation of null free array checks using the storage
>>    properties encoded on the class pointer. This uses a new node type
>>    GetNullFreePropertyNode that takes a LoadKlass or LoadNKlass as
>>    input. Masking out the property bits, extracting the null free bit is
>>    done at final graph reshaping time. Checking whether an array is
>>    flattened is not implemented with storage properties yet.
>>
>> - to guarantee known accesses to flattened arrays are not incorrectly
>>    reordered with flattened accesses hidden behind Object[] arrays, at
>>    parse time, all flattened array accesses are now on a unique new 
>> slice
>>    (for type TypeAryPtr::VALUES). When the access is on an Object[]
>>    array, membars on the TypeAryPtr::VALUES slice are added around the
>>    runtime call that performs the unknown value load/store. Once parse
>>    time is over and the compiler is aware of all accesses to flattened
>>    arrays in the compile unit, a pass is performed to move each field
>>    access for each flattened array to their own slice (the way the IR
>>    graph is currently built at parse time). The membars for Object[]
>>    arrays are duplicated, one per slice so proper orderin between
>>    accesses is still guaranteed. This should help performance because
>>    there's no wide membar that prevents c2 from optimizing memory
>>    accesses.
>>
>> - loading the layout helper from the klass structure is now performed
>>    on immutable memory so it can be hoisted.
>>
>> - loop unswitching is extended so rather than clone loops for each
>>    flattened array check (and produce 2^n loops for n accesses), it
>>    produces 2 copies of the loop: one loop with no flattened array check
>>    for legacy array accesses and another for flattened array accesses
>>    that still contain flattened array check if the loop has more than 1
>>    array access.
>>
>> - The control for array loads is also now always set to the null check
>>    or array bound check for the access so after unswitching, the body of
>>    the reference access loops is identical to the loop we would get when
>>    running with -EnableValhalla.
>>
>> - Not using a newly allocated value buffer for an unknown flattened load
>>    before it's fully initialized is now guaranteed with a StoreStore
>>    membar on raw memory and a CastPP that flagged as carrying a
>>    dependency.
>>
>> - MonomorphicArrayCheck is also improved so the casted array's type is
>>    propagated during parsing.
>>
>> - a fix for replay compilation
>>
>> Upstream 8173196 that's being reviewed hotspot-compiler-dev is also
>> needed to recover performance.
>>
>> Roland.


More information about the valhalla-dev mailing list