RFR(s): 8013171: G1: C1 x86_64 barriers use 32-bit accesses to 64-bit PtrQueue::_index
per.liden at oracle.com
Wed Apr 29 12:00:12 UTC 2015
On 2015-04-27 18:20, Christian Thalinger wrote:
>> On Apr 23, 2015, at 9:40 AM, Per Liden <per.liden at oracle.com> wrote:
>> Hi Thomas,
>>> On 23 Apr 2015, at 13:16, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>> On Thu, 2015-04-23 at 10:52 +0200, Per Liden wrote:
>>>> (This change affects G1, but it's touching code in C1 so I'd like to ask
>>>> someone from the compiler team to also reviewed this)
>>>> Summary: The G1 barriers loads and updates the PrtQueue::_index field.
>>>> This field is a size_t but the C1 version of these barriers aren't
>>>> 64-bit clean. The bug has more details.
>>>> In addition I've massaged the code a little bit, so that the 32-bit and
>>>> 64-bit sections look more similar (and as a bonus I think we avoid an
>>>> extra memory load on 32-bit).
>>>> Webrev: http://cr.openjdk.java.net/~pliden/8013171/webrev.0/
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8013171
>>>> * gc-test-suite on both 32 and 64-bit builds (with -XX:+UseG1GC
>>>> -XX:+TieredCompilation -XX:TieredStopAtLevel=3 -XX:+VerifyAfterGC)
>>>> * Passes jprt
>>> Looks good, with the following caveats which should be decided by
>>> somebody else if they are important as they are micro-opts:
>>> - instead of using cmp to compare against zero in a register, it would
>>> be better to use the test instruction (e.g. __ testX(tmp, tmp)) as it saves
>>> a byte of encoding per instruction with the same effect.
> Tighter code is always better. For barriers it might be important in tight loops to better fit in the cache.
I'll make it a testprt().
>>> - post barrier stub: I would prefer if the 64 bit code did not
>>> push/pop the rdx register to free tmp. There are explicit rscratch1/2
>>> registers for temporaries available on that platform. At least rscratch1
>>> (=r8) seems to be used without save/restore in the original code already.
>>> This would also remove the need for 64 bit code to push/pop any register it
>>> seems to me.
> Sounds like a good suggestion if it doesn’t complicate the code too much.
I'd like to avoid reintroducing different code paths for 32 and 64-bit,
which I think complicates the code. However, I can defer the pushing of
tmp until it's actually needed, which essentially gets us to the same
situation as before this change in terms of register usage for 64-bit.
Updated webrev: http://cr.openjdk.java.net/~pliden/8013171/webrev.2/
More information about the hotspot-dev