8217368: AArch64: C2 recursive stack locking optimisation not triggered
derekw at marvell.com
Fri Jan 18 18:14:18 UTC 2019
Your changes look good to me.
Once again some cleanup suggestions to pre-existing code:
"// Handle existing monitor" -> "// Check for existing monitor"
Line 3471: "// Handle existing monitor."
Move to line 3473.
Lines 3437, 3445, 3468, 3485, 3493:
Add comment to lines: "// sets result"
This set contains actual code changes, but should be clearer code:
Lines 3483, 3485:
"disp_hdr" -> "zr"
cmp(disp_hdr, rscratch1) -> cmp(rscratch1, zr)
Note that having the "sets result" comment here is important, because it's so tempting to merge CMP+BNE -> CBNZ. But that doesn't set the condition flags.
Line 3480: delete mov.
> -----Original Message-----
> From: aarch64-port-dev <aarch64-port-dev-bounces at openjdk.java.net> On
> Behalf Of Nick Gasson (Arm Technology China)
> Sent: Friday, January 18, 2019 3:40 AM
> To: hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-
> dev at openjdk.java.net>
> Cc: nd <nd at arm.com>; aarch64-port-dev at openjdk.java.net
> Subject: [EXT] [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack
> locking optimisation not triggered
> External Email
> While I was cleaning up the patch for 8216350 I noticed an issue in the
> implementation of recursive locking in aarch64_enc_fast_lock:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8217368
> Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/
> First we load the markOop of the object we want to lock and OR it with
> markOopDesc::unlocked_value (1). Then we do a CAS to exchange the
> address of the box on our thread's stack with the object's header word iff it's
> equal to the (markOop | 1) we just computed. If this fails, then we should
> check for a recursive lock by comparing
> (~(page size - 1) | 3) & (markOop - SP) == 0
> Where "markOop" is the current object header word loaded by the failed
> CAS. This checks that the lock bits are zero (locked) and the stack address of
> the displaced header is within one page of the current SP.
> But on AArch64 we actually do this:
> (~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0
> Where "old markOop | 1" is the compare-to value used for the CAS. This is
> always false as the result has at least bit #0 set. This only affects C2, the
> C1_MacroAssembler version has the correct test.
> The diff looks big but all it does is swap the usage of registers `tmp'
> and `disp_hdr' in the first section so the markOop loaded by the CAS ends up
> in disp_hdr and tmp holds the (markOop | 1) compare-to value.
> Ran jtreg, plus jcstress with -XX:+UseLSE and -XX:-UseLSE. Also added
> another microbenchmark to
> micro/org/openjdk/bench/vm/lang/LockUnlock.java as I couldn't find an
> existing JMH case that triggered this.
> Without patch:
> 510.781 ?(99.9%) 1.196 ns/op [Average]
> (min, avg, max) = (508.769, 510.781, 513.854), stdev = 1.597
> CI (99.9%): [509.585, 511.977] (assumes normal distribution)
> With patch:
> 197.038 ?(99.9%) 0.096 ns/op [Average]
> (min, avg, max) = (196.886, 197.038, 197.296), stdev = 0.128
> CI (99.9%): [196.942, 197.134] (assumes normal distribution)
> Two other minor things:
> * Does anyone know what the comment "// Load Compare Value application
> register." means? It's present in the PPC and S390 ports too.
> * The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in the
> recursive lock test. I think the "7" here is markOopDesc::biased_lock_mask
> and is presumably there to prevent a silent mutual exclusion failure if a
> markOop with the bias locking bits set ends up the fast_lock path (although
> this should never happen).
> Should we change markOopDesc::lock_mask_in_place to
> markOopDesc::biased_lock_mask_in_place in the AArch64 port too?
More information about the hotspot-compiler-dev