RFR: 8221647: [lworld] Performance regression due to the fact that check if array is array of values is not hoisted out of the loop.

Tobias Hartmann tobias.hartmann at oracle.com
Thu Jun 6 15:04:17 UTC 2019

Hi Roland,

great work!

Were you able to reproduce the problem of flattened array accesses being incorrectly reordered? I.e.
do we have a regression test for that?

Some minor comments:

- line 2151: I would prefer a return here if !_has_flattened_accesses instead of wrapping all the
code in the if

- line 3401: The comment should be adjusted
- line 3405: I think you should use GraphKit::load_object_klass or LoadKlassNode::make.
- Please also re-add/adjust the comment from line 3406 in original code

Also, I see build failures on Windows:

jib > t:/workspace/open/src/hotspot/share/opto/compile.cpp(3968): error C2220: warning treated as
error - no 'object' file generated
jib > t:/workspace/open/src/hotspot/share/opto/compile.cpp(3968): warning C4293: '<<': shift count
negative or too big, undefined behavior

And a failure with TestLWorld:

 stderr: [Exception in thread "main" java.lang.RuntimeException: Graph for 'TestLWorld::test86'
contains different number of match nodes:
662	CountedLoop	===  662  714  283  [[ 630  631  634  643  650  653  662  593  669  670  671  597
600  264  318  269  262 ]] inner stride: 4 main of N662 !orig=[608],[319],[313],[99] !jvms:
TestLWorld::test86 @ bci:8
523	CountedLoop	===  523  544  527  [[ 508  512  515  523  524  525  526  532 ]] inner stride: 1
post of N319 !orig=[319],[313],[99] !jvms: TestLWorld::test86 @ bci:8
412	CountedLoop	===  412  395  427  [[ 412  414  420  429  447  449  455  460  461 ]] inner stride:
1  !orig=[319],[313],[99] !jvms: TestLWorld::test86 @ bci:8
: expected 2 to equal 3
	at jdk.test.lib.Asserts.fail(Asserts.java:594)
	at jdk.test.lib.Asserts.assertEquals(Asserts.java:205)
	at jdk.test.lib.Asserts.assertEQ(Asserts.java:178)
	at compiler.valhalla.valuetypes.ValueTypeTest.parseOutput(ValueTypeTest.java:503)
	at compiler.valhalla.valuetypes.ValueTypeTest.execute_vm(ValueTypeTest.java:406)
	at compiler.valhalla.valuetypes.ValueTypeTest.run(ValueTypeTest.java:358)
	at compiler.valhalla.valuetypes.TestLWorld.main(TestLWorld.java:76)

The test was executed with "-XX:+EnableValhalla -XX:+UseParallelGC -XX:+UseParallelOldGC
-XX:+UseNUMA -XX:+IgnoreUnrecognizedVMOptions"


On 29.05.19 15:51, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8221647/webrev.00/
> This patch includes:
> - an implementation of null free array checks using the storage
>   properties encoded on the class pointer. This uses a new node type
>   GetNullFreePropertyNode that takes a LoadKlass or LoadNKlass as
>   input. Masking out the property bits, extracting the null free bit is
>   done at final graph reshaping time. Checking whether an array is
>   flattened is not implemented with storage properties yet.
> - to guarantee known accesses to flattened arrays are not incorrectly
>   reordered with flattened accesses hidden behind Object[] arrays, at
>   parse time, all flattened array accesses are now on a unique new slice
>   (for type TypeAryPtr::VALUES). When the access is on an Object[]
>   array, membars on the TypeAryPtr::VALUES slice are added around the
>   runtime call that performs the unknown value load/store. Once parse
>   time is over and the compiler is aware of all accesses to flattened
>   arrays in the compile unit, a pass is performed to move each field
>   access for each flattened array to their own slice (the way the IR
>   graph is currently built at parse time). The membars for Object[]
>   arrays are duplicated, one per slice so proper orderin between
>   accesses is still guaranteed. This should help performance because
>   there's no wide membar that prevents c2 from optimizing memory
>   accesses.
> - loading the layout helper from the klass structure is now performed
>   on immutable memory so it can be hoisted.
> - loop unswitching is extended so rather than clone loops for each
>   flattened array check (and produce 2^n loops for n accesses), it
>   produces 2 copies of the loop: one loop with no flattened array check
>   for legacy array accesses and another for flattened array accesses
>   that still contain flattened array check if the loop has more than 1
>   array access.
> - The control for array loads is also now always set to the null check
>   or array bound check for the access so after unswitching, the body of
>   the reference access loops is identical to the loop we would get when
>   running with -EnableValhalla.
> - Not using a newly allocated value buffer for an unknown flattened load
>   before it's fully initialized is now guaranteed with a StoreStore
>   membar on raw memory and a CastPP that flagged as carrying a
>   dependency.
> - MonomorphicArrayCheck is also improved so the casted array's type is
>   propagated during parsing.
> - a fix for replay compilation
> Upstream 8173196 that's being reviewed hotspot-compiler-dev is also
> needed to recover performance.
> Roland.

More information about the valhalla-dev mailing list