RFR(s): 8013171: G1: C1 x86_64 barriers use 32-bit accesses to 64-bit PtrQueue::_index
christian.thalinger at oracle.com
Mon Apr 27 16:20:52 UTC 2015
> On Apr 23, 2015, at 9:40 AM, Per Liden <per.liden at oracle.com> wrote:
> Hi Thomas,
>> On 23 Apr 2015, at 13:16, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> On Thu, 2015-04-23 at 10:52 +0200, Per Liden wrote:
>>> (This change affects G1, but it's touching code in C1 so I'd like to ask
>>> someone from the compiler team to also reviewed this)
>>> Summary: The G1 barriers loads and updates the PrtQueue::_index field.
>>> This field is a size_t but the C1 version of these barriers aren't
>>> 64-bit clean. The bug has more details.
>>> In addition I've massaged the code a little bit, so that the 32-bit and
>>> 64-bit sections look more similar (and as a bonus I think we avoid an
>>> extra memory load on 32-bit).
>>> Webrev: http://cr.openjdk.java.net/~pliden/8013171/webrev.0/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8013171
>>> * gc-test-suite on both 32 and 64-bit builds (with -XX:+UseG1GC
>>> -XX:+TieredCompilation -XX:TieredStopAtLevel=3 -XX:+VerifyAfterGC)
>>> * Passes jprt
>> Looks good, with the following caveats which should be decided by
>> somebody else if they are important as they are micro-opts:
>> - instead of using cmp to compare against zero in a register, it would
>> be better to use the test instruction (e.g. __ testX(tmp, tmp)) as it saves
>> a byte of encoding per instruction with the same effect.
Tighter code is always better. For barriers it might be important in tight loops to better fit in the cache.
>> - post barrier stub: I would prefer if the 64 bit code did not
>> push/pop the rdx register to free tmp. There are explicit rscratch1/2
>> registers for temporaries available on that platform. At least rscratch1
>> (=r8) seems to be used without save/restore in the original code already.
>> This would also remove the need for 64 bit code to push/pop any register it
>> seems to me.
Sounds like a good suggestion if it doesn’t complicate the code too much.
>> - the original code only pushed/popped rbx when there was need to. Now
>> the generated code pushes/pops rdx always.
>> In general, the new code is easier to follow (and unifies 32/64 bit code
>> paths), but seems slightly worse in execution time to me (without testing,
>> just gut feeling). It probably won't matter at the end of the day.
> Thanks for looking at the patch!
> I don’t think these optimizations will make a difference given the nature of C1, but let’s see if someone has a different opinion.
More information about the hotspot-dev