RFR (XL) 8031320: Use Intel RTM instructions for locks

Christian Thalinger christian.thalinger at oracle.com
Wed Mar 19 01:07:36 UTC 2014


+       warning("RTMTotalCountIncrRate must be a power of 2, reseting it to 64");
+       warning("RTMAbortRatio must be in the range 0 to 100, reseting it to 50”);

Typo: “resetting”


+// States of Restricted Transactional Memory usage.
+enum RTMState {
+  NoRTM      = 0x2, // Don't use RTM
+  UseRTM     = 0x1, // Use RTM
+  ProfileRTM = 0x0  // Use RTM with abort ratio calculation

RTM states are defined as an enum but we are using it as an int:

+   // State of RTM code generation during compilation of the method
+   int               _rtm_state;
+ #endif

+   int rtm_state() const {
+     return _rtm_state;
+   }
+   void set_rtm_state(int rstate) {
+     _rtm_state = rstate;
+   }
+   void atomic_set_rtm_state(int rstate) {
+     Atomic::store(rstate, &_rtm_state);
+   }
+   static int rtm_state_offset_in_bytes() {
+     return offset_of(MethodData, _rtm_state);
+   }
+ #endif

If we don’t trust the C++ compilers to use an int as storage type at least make the methods to take the enum type and cast it to an int.


+ //------------------------------Opaque3Node------------------------------------
+ // A node to prevent unwanted optimizations. Will be optimized only during
+ // macro nodes expansion.
+ class Opaque3Node : public Opaque2Node {

I know we already have two opaque nodes but that doesn’t mean we have to add a third one :-)  Can we give it a better name?  Something meaningful?


+   RTMLockingCounters* _rtmcounters;      // RTM lock counters for inflated locks
+   RTMLockingCounters* _stackrtmcounters; // RTM lock counters for stack locks

These names are used a couple times in various places.  I would like to see them with underscores:

+   RTMLockingCounters* _rtm_counters;      // RTM lock counters for inflated locks
+   RTMLockingCounters* _stack_rtm_counters; // RTM lock counters for stack locks

On Mar 17, 2014, at 12:11 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> https://bugs.openjdk.java.net/browse/JDK-8031320
> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
> The Intel architectures codenamed Haswell has support for RTM (Restricted Transactional Memory) instructions xbegin, xabort, xend and xtest as part of Intel Transactional Synchronization Extension (TSX). The xbegin and xend instructions enclose a set of instructions to be executed as a transaction. If no conflict found during execution of the transaction, the memory and register modifications are committed together at xend. xabort instruction can be used for explicit abort of transaction and xtest to check if we are in transaction.
> RTM is useful for highly contended locks with low conflict in the critical region. The highly contended locks don't scale well otherwise but with RTM they show good scaling. RTM allows using coarse grain locking for applications. Also for lightly contended locks which are used by different threads RTM can reduce cache line ping pong and thereby show performance improvement too.
> Implementation:
> Generate RTM locking code for all inflated locks when "UseRTMLocking" option is on with normal locking as fall back mechanism. On abort or lock busy the lock will be retried a fixed number of times as specified by "RTMRetryCount" option.  The locks which abort too often can be auto tuned or manually tuned.
> Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort ratio calculation code for each lock. The abort ratio will be calculated after "RTMAbortThreshold" aborts are encountered.
> With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the nmethod containing the lock will be deoptimized and recompiled with all locks as normal (stack) locks. If the abort ratio continues to remain low after "RTMLockingThreshold" attempted locks, then the method will be deoptimized and recompiled with all locks as RTM locks without abort ratio calculation code. The abort ratio calculation can be delayed by specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
> Deoptimization of nmethod is done by adding an uncommon trap at the beginning of the code which checks rtm state field in MDO which is modified by the abort calculation code.
> For manual tuning the abort statistics for each lock could be provided to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag. Based on the abort statistics users can create a .hotspot_compiler file or use -XX:CompileCommand=<option> flag to specify for which methods disable RTM locking using <option> "NoRTMLockEliding" or always enable RTM locking using <option> "UseRTMLockEliding".
> The abort calculation and statistic collection are done using RTMLockingCounters wrapped into RTMLockingNamedCounter counters which are generated for each lock. To reduce burden on cache line RTM lock total counter is updated randomly with RTMTotalCountIncrRate rate.
> Note, both auto and manually tuning is done for whole method. There is no a mechanism to tune an individual lock.
> RTM locking can be used for normal (stack) locks by specifying "UseRTMForStackLocks" flag.
> RTM locking code requires that biased locking is switched off because it conflicts with it. RTM locking is most useful when there is high lock contention and low data contention.  With high lock contention the lock is usually inflated and biased locking is not suitable for that case anyway.
> It was requested that this code did not affect other platforms. For that the most of the code is put under #if INCLUDE_RTM_OPT which is defined only for X86 and C2 and not EMBEDDED.
> All new RTM flags are declared as experimental and require to specify "UnlockExperimentalVMOptions" flag.
> SQE did full testing on these changes. Additional tests were developed.
> Thanks,
> Vladimir

More information about the hotspot-dev mailing list