aarch64 AD-file / matching rule

Benedikt Wedenik benedikt.wedenik at theobroma-systems.com
Thu Apr 30 10:06:32 UTC 2015


thanks for your quick help!
But I found out, that the pattern I was searching for is emitted here:

This means, my pattern will never match the rule in the AD-file because it is more or less “hardcoded” :)
I wrote a small simulation program to see if the rule would match in JIT-compiled code and it worked.

I’ll do some more investigation in how to optimise this pattern in the C++ code because it occurs quite often.

Thanks again, 

On 29 Apr 2015, at 16:37, Lindenmaier, Goetz <goetz.lindenmaier at sap.com> wrote:

> Hi,
> I am using PrintOptoAssembly in such cases.  This tells me how the IR is looking after
> matching.  Together with PrintAssembly you can manage to locate the block
> with the pattern.
> With PrintIdeal you can see the graph before matching.  You should find the pattern
> you described in the ad rule there.  Hard to read, though.
> There is also the PrintIdealGraph flag, printing a graph you can visualize.
> I didn’t use that, though.  We have instrumented the opto compiler with
> our own graph printer.
> I could imagine that the AndI node has more than one usage/out edge.
> Then it’s not a tree-like subgraph, and the matcher can not apply the rule.
> This is something you would check in the PrintIdeal output or in the last
> Ideal graph before matching.
> Best regards,
>   Goetz.
> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Benedikt Wedenik
> Sent: Mittwoch, 29. April 2015 14:50
> To: hotspot-compiler-dev at openjdk.java.net
> Cc: Dr. Philipp Tomsich; Benedikt Huber
> Subject: aarch64 AD-file / matching rule
> Hi!
> I’m writing compiler-optimisations for the aarch64 port at the moment and I am using specjbb2005 for benchmarking.
> One of the patterns I want to optimise is the following:
>   0x0000007f8c2961b4: and w2, w2, #0x7ffff8
>   0x0000007f8c2961b8: cmp w2, #0x0
>   0x0000007f8c2961bc: b.eq     0x0000007f8c2968f4
> Here I see an opportunity for ands, b.eq.
> I created a new rule in the cpu/aarch64/vm/aarch64.ad file.
> My matching looks like this:
> instruct and_cmp_branch(cmpOp cmp, immI0 zero, iRegIorL2I src1, immILog src2, label lbl, rFlagsReg cr) %{
>   match(If cmp (CmpI (AndI src1 src2) zero) );
>   effect(USE lbl);
>   ins_cost(0); // is zero at the moment to be sure the rule is triggered.
>   ins_encode %{
>     Label* L = $lbl$$label;
>     Assembler::Condition cond = (Assembler::Condition)$cmp$$cmpcode;
>     __ andsw(as_Register($src1$$reg),
>         as_Register($src1$$reg),
>         (unsigned long)($src2$$constant));
>     __ br ((Assembler::Condition)$cmp$$cmpcode, *L);
>   %}  
>   ins_pipe(pipe_cmp_branch); //TODO but not relevant yet
> %}
> As I don’t know whether my matching-rule is wrong or something else stops the rule from getting emitted I wanted to find out which “and”-rule is triggered for this pattern.
> I inserted some nop’s to locate the according rule and I found out, that most of the emitted “and”s were surrounded by nop’s except for my pattern and some few other ones like this one:
> 0x0000007f984bf568: eor   x1, x0, x1
> 0x0000007f984bf56c: and   x1, x1, #0xffffffffffffff87
> 0x0000007f984bf570: cbz   x1, 0x0000007f984bf664
> 0x0000007f984bf574: and   xscratch1, x1, #0x7
> 0x0000007f984bf578: cbnz  xscratch1, 0x0000007f984bf5f0
> 0x0000007f984bf57c: and   xscratch1, x1, #0x300
> 0x0000007f984bf580: cbnz  xscratch1, 0x0000007f984bf5b8
> 0x0000007f984bf584: mov   xscratch1, #0x37f                   // #895
> 0x0000007f984bf588: and   x0, x0, xscratch1
> 0x0000007f984bf58c: orr   x1, x0, xthread
> 0x0000007f984bf590: ldaxr xscratch1, [x3]
> 0x0000007f984bf594: cmp   xscratch1, x0
> 0x0000007f984bf598: b.ne  0x0000007f984bf5a8
> Usually I call the program like this:
> ————
> JAVA=/root/bwedenik/jdk8/jdk8/build/linux-aarch64-normal-server-release/jdk/bin/java
> $JAVA -fullversion
> $JAVA -server -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+OptimizeStringConcat -XX:+UseBiasedLocking -XX:+UseParallelGC -XX:ParallelGCThreads=10 -XX:+UseParallelOldGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15  -Xms10g -Xmx10g -Xmn4g -Xss64m -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand='print,*DeliveryTransaction.preprocess' spec.jbb.JBBmain -propfile SPECjbb.props
> ————
> I tried to figure out if this problem only occurs with c1, c2 or pure interpretation mode and these are the results (calling java as usual including the given arguments):
> * [-Xint] : This gives me neither the inserted nop’s nor the pattern I am searching for (as expected due to no compilation).
> * [-client -Xcomp -XX:-TieredCompilation] : Here the cmp for #0x0 only occurs about 3 times in the whole disassembly, instead of about 200 times without these flags. In addition there are no of my inserted nop’s in the disass.
> * [-server -Xcomp -XX:-TieredCompilation] : Same as -client.
> My question is now how to find out why the rule does not match / if the rule is correct and how to find the actual rule which emits the code of my desired pattern.
> Thanks in advance,
> Benedikt Wedenik, Theobroma-Systems.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150430/66e46630/attachment-0001.html>

More information about the hotspot-compiler-dev mailing list