RFR: JDK-8196064: AArch64: Merging ld/st into ldp/stp in macro-assembler

Andrew Haley aph at redhat.com
Thu Feb 1 15:36:34 UTC 2018

On 30/01/18 03:30, Zhongwei Yao wrote:
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8196064
> Webrev:
> http://cr.openjdk.java.net/~zyao/8196064/webrev.00
> This patch merges adjacent load/store into ldp/stp in macro-assembler
> as discussed in previous thread:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-December/027957.html
> My micro-benchmark case gives about 8% performance improvement with this patch.
> And this patch is based on commit be48be2 due to recent build failure
> on jdk/hs master, and it passes all jtreg tests.
> Please help to review it.

Very nice.  You've done a good job.  This merging opportunity happens
a lot, and it reduces not only the size of compiled code but also the
size of the interpreter and the generated stubs.  It's definitely
worth doing.

before: ArrObj.testArrObjCopy  avgt   20  3.075 ? 0.002  us/op
 after: ArrObj.testArrObjCopy  avgt   20  2.779 ? 0.004  us/op

So, some improvement.  One thing I noticed:

Compiled method (c2)    6812  627       4       org.linaro.benchmarks.ArrObj::testArrObjCopy (32 bytes)

 ;; B44: #	B45 <- B43  Freq: 40.1247

  0x000003ff94b0db50: ldr	x17, [sp]
  0x000003ff94b0db54: ldp	x18, x1, [sp,#8]
  0x000003ff94b0db58: ldp	x10, x3, [sp,#24]
  0x000003ff94b0db5c: ldp	x4, x11, [sp,#40]
  0x000003ff94b0db60: ldr	x6, [sp,#56]

It seems to me like there could be four ldp instructions here.  There
aren't (as far as I can see) because they're in an unfortunate order,
so the MacroAssembler doesn't see the opportunity:

 ;; B44: #	B45 <- B43  Freq: 39.2717

  0x000003ffad17bbf8: ldr	x17, [sp]
  0x000003ffad17bbfc: ldr	x1, [sp,#16]
  0x000003ffad17bc00: ldr	x18, [sp,#8]
  0x000003ffad17bc04: ldr	x10, [sp,#24]
  0x000003ffad17bc08: ldr	x3, [sp,#32]
  0x000003ffad17bc0c: ldr	x4, [sp,#40]
  0x000003ffad17bc10: ldr	x11, [sp,#48]
  0x000003ffad17bc14: ldr	x6, [sp,#56]

This is OK: I don't think we want to make MacroAssembler even more
complicated in order to handle occasional out-of-order spills.

Also, last_membar and last_ldst seem to be doing the same thing.  I
think that you could simply have a single last_insn field rather than
both last_membar and last_ldst, and then inspect the instruction to
see if it's a membar or a ld/st.

Finally, please add BLOCK_COMMENTs in merge_ldst, like this:

  if (!is_store) {
    BLOCK_COMMENT("merged ldr pair");
    if (sz == 8) {
      ldp(rt_low, rt_high, adr_p);
    } else {
      ldpw(rt_low, rt_high, adr_p);
  } else {
    BLOCK_COMMENT("merged str pair");
    if (sz == 8) {
      stp(rt_low, rt_high, adr_p);
    } else {
      stpw(rt_low, rt_high, adr_p);


Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

More information about the hotspot-compiler-dev mailing list