RFR (XL) 8031320: Use Intel RTM instructions for locks

Igor Veresov igor.veresov at oracle.com
Thu Mar 20 03:48:17 UTC 2014

Thanks for renaming the registers! But I noticed some inconsistencies in the comments that are the result of that..

1334 // Perform abort ratio calculation, set no_rtm bit if high ratio
1335 // input:  rtm_counters_Reg (RTMLockingCounters* address)
1336 // tmpReg, scrReg and flags as scratch
1337 void MacroAssembler::rtm_abort_ratio_calculation(Register tmpReg,
1338                                                  Register rtm_counters_Reg,
1339                                                  RTMLockingCounters* rtm_counters,
1340                                                  Metadata* method_data) {

Should probably say that rtm_counters_Reg is killed, there’s no scrReg in params anymore.

1392 // Update counters and perform abort ratio calculation
1393 // input:  boxReg (object monitor address)
1394 //         abort_status_Reg
1395 // rtm_counters_Reg, flags as scratch
1396 void MacroAssembler::rtm_profiling(Register abort_status_Reg,
1397                                    Register rtm_counters_Reg,
1398                                    RTMLockingCounters* rtm_counters,
1399                                    Metadata* method_data,
1400                                    bool profile_rtm) {

There doesn’t seen to be a boxReg here.

1412     // Perform abort ratio calculation, set dontelide bit and rtm_state
1413     // input:  boxReg (object monitor address)
1414     //      :  rtm_counters_Reg
1415     // tmpReg, scrReg, flags as scratch
1416     assert(rtm_counters != NULL, "should not be NULL when profiling RTM");
1417     rtm_abort_ratio_calculation(abort_status_Reg, rtm_counters_Reg, rtm_counters, method_data);

Mentions boxReg that is no longer there.

1426 // Retry on abort if abort's status is 0x6: can retry (0x2) | memory conflict (0x4)
1427 // inputs: boxReg (monitor address)
1428 //       : retry_count
1429 //       : abort_status
1430 // output: retry_count decremented by 1
1431 // flags as scratch
1432 void MacroAssembler::rtm_retry_lock_on_abort(Register retry_count, Register box, Register abort_status, Label& retryLabel) {

May be add the .*_Reg or .*Reg suffix to these guys and update the comments? 

1448 // Spin and retry if lock is busy,
1449 // inputs: box (monitor address)
1450 //       : retry_count
1451 // output: retry_count decremented by 1
1452 //       : clear z flag if retry count exceeded
1453 // scr as scratch
1454 void MacroAssembler::rtm_retry_lock_on_busy(Register retry_count, Register box, Register tmp, Register scr, Label& retryLabel) {

.*_Reg and/or .*Reg?

1542 // Use RTM for inflating locks
1543 // Inputes: objReg (object to lock)
1544 //          boxReg (on-stack box address (displaced header location) - KILLED)
1545 //          tmpReg (ObjectMonitor address + 2(monitor_value))
1546 void MacroAssembler::rtm_inflated_locking(Register objReg, Register boxReg, Register tmpReg,
1547                                           Register scrReg, Register retry_on_busy_count_Reg,
1548                                           Register retry_on_abort_count_Reg,
1549                                           RTMLockingCounters* rtm_counters,
1550                                           Metadata* method_data, bool profile_rtm,
1551                                           Label& DONE_LABEL) {

Typo in “Inputes”.

1596     // retry on lock abort if abort status is one of 0xD
1597     // inputs: boxReg (monitor address)
1598     //       : retry_on_abort_count_Reg
1599     //       : abort_status_Reg
1600     // output: tmpReg set to boxReg, cx2Reg decremented by 1
1601     rtm_retry_lock_on_abort(retry_on_abort_count_Reg, boxReg, abort_status_Reg, L_rtm_retry);

The output section of the comment mentions regs that are not passed.


On Mar 19, 2014, at 7:47 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:

> I updated changes based reviews.
> http://cr.openjdk.java.net/~kvn/8031320_9/webrev.01/
> Main changes are in macroAssembler_x86.cpp. I moved RTM code from fast_lock() method into separate methods:  rtm_stack_locking(), rtm_inflated_locking() and for common rtm code rtm_profiling().
> I did registers renaming in local scopes to reflect what values they contain.
> I removed some asm instructions which results are not used (experimental code leftover).
> 3 flags were converted to product flags: UseRTMLocking, UseRTMDeopt, RTMRetryCount.
> In phase1.cpp used TypeMetadataPtr for MDO pointer instead of RawPtr. Hit bug in TypeMetadataPtr::xmeet() and fixed it.
> Thanks,
> Vladimir
> On 3/17/14 12:11 PM, Vladimir Kozlov wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8031320
>> http://cr.openjdk.java.net/~kvn/8031320_9/webrev/
>> The Intel architectures codenamed Haswell has support for RTM
>> (Restricted Transactional Memory) instructions xbegin, xabort, xend and
>> xtest as part of Intel Transactional Synchronization Extension (TSX).
>> The xbegin and xend instructions enclose a set of instructions to be
>> executed as a transaction. If no conflict found during execution of the
>> transaction, the memory and register modifications are committed
>> together at xend. xabort instruction can be used for explicit abort of
>> transaction and xtest to check if we are in transaction.
>> RTM is useful for highly contended locks with low conflict in the
>> critical region. The highly contended locks don't scale well otherwise
>> but with RTM they show good scaling. RTM allows using coarse grain
>> locking for applications. Also for lightly contended locks which are
>> used by different threads RTM can reduce cache line ping pong and
>> thereby show performance improvement too.
>> Implementation:
>> Generate RTM locking code for all inflated locks when "UseRTMLocking"
>> option is on with normal locking as fall back mechanism. On abort or
>> lock busy the lock will be retried a fixed number of times as specified
>> by "RTMRetryCount" option.  The locks which abort too often can be auto
>> tuned or manually tuned.
>> Auto-tuning can be done using "UseRTMDeopt" flag which will add an abort
>> ratio calculation code for each lock. The abort ratio will be calculated
>> after "RTMAbortThreshold" aborts are encountered.
>> With "UseRTMDeopt" if the aborts ratio reaches "RTMAbortRatio" the
>> nmethod containing the lock will be deoptimized and recompiled with all
>> locks as normal (stack) locks. If the abort ratio continues to remain
>> low after "RTMLockingThreshold" attempted locks, then the method will be
>> deoptimized and recompiled with all locks as RTM locks without abort
>> ratio calculation code. The abort ratio calculation can be delayed by
>> specifying -XX:RTMLockingCalculationDelay=<millisec> flag.
>> Deoptimization of nmethod is done by adding an uncommon trap at the
>> beginning of the code which checks rtm state field in MDO which is
>> modified by the abort calculation code.
>> For manual tuning the abort statistics for each lock could be provided
>> to a user using "PrintPreciseRTMLockingStatistics" diagnostic flag.
>> Based on the abort statistics users can create a .hotspot_compiler file
>> or use -XX:CompileCommand=<option> flag to specify for which methods
>> disable RTM locking using <option> "NoRTMLockEliding" or always enable
>> RTM locking using <option> "UseRTMLockEliding".
>> The abort calculation and statistic collection are done using
>> RTMLockingCounters wrapped into RTMLockingNamedCounter counters which
>> are generated for each lock. To reduce burden on cache line RTM lock
>> total counter is updated randomly with RTMTotalCountIncrRate rate.
>> Note, both auto and manually tuning is done for whole method. There is
>> no a mechanism to tune an individual lock.
>> RTM locking can be used for normal (stack) locks by specifying
>> "UseRTMForStackLocks" flag.
>> RTM locking code requires that biased locking is switched off because it
>> conflicts with it. RTM locking is most useful when there is high lock
>> contention and low data contention.  With high lock contention the lock
>> is usually inflated and biased locking is not suitable for that case
>> anyway.
>> It was requested that this code did not affect other platforms. For that
>> the most of the code is put under #if INCLUDE_RTM_OPT which is defined
>> only for X86 and C2 and not EMBEDDED.
>> All new RTM flags are declared as experimental and require to specify
>> "UnlockExperimentalVMOptions" flag.
>> SQE did full testing on these changes. Additional tests were developed.
>> Thanks,
>> Vladimir

More information about the hotspot-dev mailing list