RFR(M) 8150353: PPC64LE: Support RTM on linux

Doerr, Martin martin.doerr at sap.com
Mon Mar 7 10:29:13 UTC 2016

Hi Vladimir,

thank you very much for the detailed analysis.
I hope an #ifdef PPC64 is ok in the shared code?

I had written something to Gustavo about the performance problem we have with RTM in SPEC jbb2005:

> The following issue is important for performance work:
> RTM does not work with BiasedLocking. The latter gets switched off if RTM is activated which has a large performance impact (especially in jbb2005).
> I would disable it for a reference measurement:
> -XX:-UseBiasedLocking
> Unfortunately, RTM was slower than BiasedLocking but faster than the reference (without both) which tells me that there's room for improvement.
> There are basically 3 classes of locks:
> 1. no contention
> 2. contention on lock, low contention on data
> 3. high contention on data
> I believe the optimal treatment for the cases would be:
> 1. Biased Locking
> 2. Transactional Memory
> 3. classical locking with lock inflating
> I think it would be good if the JVM could optimize for all these cases in the future. But that would add additional complexity and code size.

Do you think this is something which should be improved in the future?
We could try e.g. the following approach
- try biased
- deoptimize if it doesn't work well, try transactional
- deoptimize if it doesn't work well, use classical locking (with inflating)

Best regards,

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Freitag, 26. Februar 2016 03:24
To: Doerr, Martin <martin.doerr at sap.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net
Cc: brenohl at br.ibm.com
Subject: Re: RFR(M) 8150353: PPC64LE: Support RTM on linux

The problem with increasing ScratchBufferBlob size is that with Tiered 
compilation we scale number of compiler threads based on cpu count and 
increase space in CodeCache accordingly:

   code_buffers_size += c2_count * C2Compiler::initial_code_buffer_size();

I did experiment on Intel setting ON all RTM flags which can increase 
size of lock code:

$ java -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions 
-XX:+UseRTMLocking -XX:+UseRTMDeopt -XX:+UseRTMForStackLocks 
-XX:+PrintPreciseRTMLockingStatistics -XX:+PrintFlagsFinal -version 
|grep RTM
Java HotSpot(TM) 64-Bit Server VM warning: UseRTMLocking is only 
available as experimental option on this platform.
      bool PrintPreciseRTMLockingStatistics         := true 
                    {C2 diagnostic}
      intx RTMAbortRatio                             = 50 
                    {ARCH experimental}
      intx RTMAbortThreshold                         = 1000 
                    {ARCH experimental}
      intx RTMLockingCalculationDelay                = 0 
                    {ARCH experimental}
      intx RTMLockingThreshold                       = 10000 
                    {ARCH experimental}
     uintx RTMRetryCount                             = 5 
                    {ARCH product}
      intx RTMSpinLoopCount                          = 100 
                    {ARCH experimental}
      intx RTMTotalCountIncrRate                     = 64 
                    {ARCH experimental}
      bool UseRTMDeopt                              := true 
                    {ARCH product}
      bool UseRTMForStackLocks                      := true 
                    {ARCH experimental}
      bool UseRTMLocking                            := true 
                    {ARCH product}
      bool UseRTMXendForLockBusy                     = true 
                    {ARCH experimental}

I added next lines to the end of Compile::scratch_emit_size() method:

   if (n->is_Mach() && n->as_Mach()->ideal_Opcode() == Op_FastLock) {
     tty->print_cr("======== FastLock size:  %d  ==========", 
   if (n->is_Mach() && n->as_Mach()->ideal_Opcode() == Op_FastUnlock) {
     tty->print_cr("======== FastUnlock size:  %d  ==========", 

and got:

======== FastLock size:  657  ==========
======== FastUnlock size:  175  ==========


On 2/25/16 3:43 AM, Doerr, Martin wrote:
> Hi Vladimir,
> thanks for taking a look.
> About version values:
> We are using a similar scheme for version checks on AIX where we know that the version values are less than 256.
> It makes comparisons much more convenient.
> But I agree that we should double-check if it is guaranteed for linux as well (and possibly add an assertion).
> About scratch buffer size:
> We only noticed that the scratch buffer was too small when we enable all RTM features:
> -XX:+UnlockExperimentalVMOptions -XX:+UseRTMLocking -XX:+UseRTMForStackLocks -XX:+UseRTMDeopt
> We have only tried on PPC64, but I wonder if the current size is sufficient for x86. I currently don't have access to a Skylake machine.
> I think adding 1024 bytes to the scratch buffer doesn't hurt.
> (It may also lead to larger CodeBuffers in output.cpp but I don't think this is problematic as long as the real content gets copied to nmethods.)
> Would you agree?
> Best regards,
> Martin
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Donnerstag, 25. Februar 2016 00:54
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>; Doerr, Martin <martin.doerr at sap.com>; hotspot-dev at openjdk.java.net
> Cc: brenohl at br.ibm.com
> Subject: Re: RFR(M) 8150353: PPC64LE: Support RTM on linux
> My concern (but I am not export) is Linux version encoding. Is it true
> that each value in x.y.z is less then 256? Why not keep them as separate
> int values?
> I also thought we have OS versions in make files but we check only gcc
> version there.
> Do you have problem with ScratchBufferBlob only on PPC or on some other
> platforms too? May be we should make MAX_inst_size as platform specific
> value.
> Thanks,
> Vladimir
> On 2/24/16 11:50 AM, Gustavo Romero wrote:
>> Hi Martin,
>> Both little and big endian Linux kernel contain the syscall change, so
>> I did not include:
>> #if defined(COMPILER2) && (defined(AIX) || defined(VM_LITTLE_ENDIAN)
>> in globalDefinitions_ppc.hpp.
>> Please, could you review the following change?
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8150353
>> Webrev (hotspot): http://81.de.7a9f.ip4.static.sl-reverse.com/webrev/
>> Summary:
>> * Enable RTM support for Linux on PPC64 (LE and BE).
>> * Fix C2 compiler buffer size issue.
>> Thank you.
>> Regards,
>> Gustavo

More information about the hotspot-dev mailing list