Bruno Alexandre Rosa bruno.rosa at eldorado.org.br
Thu Nov 24 20:53:29 UTC 2016

Igor forgot to post the web-rev, so I'm posting it here for him:


Bruno Rosa

Hi all,

The following rev solves an improvement suggested by Gustavo Romero 


The issue is:

Use andis. in place of following sequence.

   845 1.2e-04 :    3fff6027e134:       lis     r18,7
36054  0.0052 :    3fff6027e138:       and     r18,r17,r18

This patch resulted in small improvements in the Opto Assembly dumped code.
See the explanation below:

Situation 1.1)

03c   B3: #     B7 B4 <- B2  Freq: 0.899982 03c     LIS     R15, #133955584.hi
040     AND     R14, R3, R15
044     CMPW    CCR6, R14, R15
048     Beq     CCR6, B7  P=0.100000 C=-1.000000

Situation 1.2)

03c   B3: #     B7 B4 <- B2  Freq: 0.899982
03c     ANDIS   R15, R3, #133955584.hi
040     LIS     R17, #133955584.hi
044     CMPW    CCR5, R15, R17
048     Beq     CCR5, B7  P=0.100000 C=-1.000000

Situation 2.1)

370   B91: #    B392 B92 <- B90  Freq: 0.000197734
370     LIS     R14, #251658240.hi
374     AND     R15, R3, R14
378     LIS     R17, #16777216.hi
37c     CMPW    CCR5, R15, R17
380     Beq     CCR5, B392  P=0.100000 C=-1.000000

Situation 2.2)

370   B91: #    B392 B92 <- B90  Freq: 0.000197734
370     ANDIS   R15, R3, #251658240.hi
374     LIS     R14, #16777216.hi
378     CMPW    CCR6, R15, R14
37c     Beq     CCR6, B392  P=0.100000 C=-1.000000

In situations 1.1 and 2.1 the patch is not applied. In 1.2 and 2.2 the patch is applied.

Comparing 2.1 and 2.2 some performance gain is seen, as one less instruction is needed.

Comparing 1.1 and 1.2, no performance gain is seen. In 1.1 the value loaded in R15 is used in AND and CMPW (no reload).
In 1.2, the ANDIS operation is executed first, so that, no register reuse is made.


Igor Nunes

