RFR: 8252372: Check if cloning is required to move loads out of loops in PhaseIdealLoop::split_if_with_blocks_post()

Tobias Hartmann thartmann at openjdk.java.net
Tue Apr 27 09:15:43 UTC 2021

On Mon, 26 Apr 2021 09:46:18 GMT, Roland Westrelin <roland at openjdk.org> wrote:

> Sinking data nodes out of a loop when all uses are out of a loop has
> several issues that this attempts to fix.
> 1- Only non control uses are considered which makes little sense (why
>   not sink if the data node is an argument to a call or a returned
>   value?)
> 2- Sinking of Loads is broken because of the handling of
>   anti-dependence: the get_late_ctrl(n, n_ctrl) call returns a control
>   in the loop because it takes all uses into account.
> 3- For data nodes for which a control edge can't be set, commoning of
>   clones back in the loop is prevented with:
>   _igvn._worklist.yank(x);
>   which gives no guarantee
> This patch tries to address all issues:
> 1- it looks at all uses, not only non control uses
> 2- anti-dependences are computed for each use independently
> 3- Cast nodes are used to pin clones out of loop
> 2- requires refactoring of the PhaseIdealLoop::get_late_ctrl()
> logic. While working on this, I noticed a bug in anti-dependence
> analysis: when the use is a cfg node, the code sometimes looks at uses
> of the memory state of the cfg. The logic uses the use of the cfg
> which is a projection of adr_type identical to the cfg. It should
> instead look at the use of the memory projection.
> The existing logic for sinking loads calls clear_dom_lca_tags() for
> every load which seems like quite a waste. I added a
> _dom_lca_tags_round variable that's or'ed with the tag_node's _idx. By
> incrementing _dom_lca_tags_round, new tags that don't conflict with
> existing ones are produced and there's no need for
> clear_dom_lca_tags().
> For anti-dependence analysis to return a correct result, early control
> of the load is needed. The only way to get it at this stage, AFAICT,
> is to compute it by following the load's input until a pinned node is
> reached.
> The existing logic pins cloned nodes next to their use. The logic I
> propose pins them right out of the loop. This could possibly avoid
> some redundant clones. It also makes some special handling for corner
> cases with loop strip mining useless.
> For 3-, I added extra Cast nodes for float types. If a chain of data
> nodes are sunk, the new logic tries to keep a single Cast for the
> entire chain rather than one Cast per node.

Hi Roland,

I didn't look at this in detail yet but gave it a quick run through our testing. I'm seeing many of the following failures with the `jdk/incubator/vector/` tests:

assert(cast != __null) failed: must have added a cast to pin the node

Current CompileTask:
C2:  48324 7090    b        ShortMaxVectorTests::getShortMaxVectorTests (1916 bytes)

Stack: [0x00007f13688f8000,0x00007f13689f9000],  sp=0x00007f13689f3250,  free space=1004k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x13293b4]  PhaseIdealLoop::try_sink_out_of_loop(Node*)+0x8d4
V  [libjvm.so+0x132969c]  PhaseIdealLoop::split_if_with_blocks_post(Node*)+0x5c
V  [libjvm.so+0x132a311]  PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x201
V  [libjvm.so+0x131d2ad]  PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x121d
V  [libjvm.so+0xa4193a]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x2da
V  [libjvm.so+0xa3d1f6]  Compile::Optimize()+0x586
V  [libjvm.so+0xa40946]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x2276
V  [libjvm.so+0x861cfa]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1ea
V  [libjvm.so+0xa50a59]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xfb9
V  [libjvm.so+0xa517c8]  CompileBroker::compiler_thread_loop()+0x5a8
V  [libjvm.so+0x18bc4d1]  JavaThread::thread_main_inner()+0x271
V  [libjvm.so+0x18c3f10]  Thread::call_run()+0x100
V  [libjvm.so+0x159a26e]  thread_native_entry(Thread*)+0x10e

`jdk/incubator/vector/ShortMaxVectorTests.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -Xcomp`


PR: https://git.openjdk.java.net/jdk/pull/3689

More information about the hotspot-compiler-dev mailing list