RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic

Ivan Krylov ivan at azulsystems.com
Mon Feb 1 16:21:07 UTC 2016

I am afraid I am still confused.

 > As Vladimir suggested, it’s better to check 
Matcher::match_rule_supported() in c2compiler.cpp in 

The -XX:DisableIntrinsic flags comes with the built in mechanism of 
verification, right?
The call_generator() calls findIntrinsic with it's lazy initialization 
and a check against the exclusion list. Seems to work fine with my 
little test, but I am fine with conforming to whatever is a custom here, 
I am still new to c2 code.

(gdb) p token
$1 = 0x7ffff0276680 "_onSpinWait"
(gdb) bt
#0  DirectiveSet::is_intrinsic_disabled (this=0x7ffff019b2e0, method=...)
#1  0x00007ffff655f6fd in Compile::make_vm_intrinsic 
(this=0x7fff923b2ba0, m=0x7fff740336b0, is_virtual=false)
#2  0x00007ffff60917d1 in Compile::find_intrinsic (this=0x7fff923b2ba0, 
m=0x7fff740336b0, is_virtual=false)
#3  0x00007ffff6206c06 in Compile::call_generator (this=0x7fff923b2ba0, 
callee=0x7fff740336b0, vtable_index=-4, call_does_dispatch=false, 
     allow_inline=true, prof_factor=1, speculative_receiver_type=0x0, 
allow_intrinsics=true, delayed_forbidden=false)
#4  0x00007ffff6208506 in Parse::do_call (this=0x7fff923b1500) at 
#5  0x00007ffff676e305 in Parse::do_one_bytecode (this=0x7fff923b1500)

 > we don’t support CPUs with SSE < 2
We as openjdk or the commercial jdk? But doesn't matter at the end as 
the pause instruction is harmless even for prior-to-SSE2 CPUs.

 >Is there any reason this can’t be moved to generic x86.ad
I will adopt this.

To Vladimir's points:

 > Why you check intrinsic in inline_native_Class_query() ?

My improper following the code of other intrinsics. Will fix that.

 > (Various test suggestions)

I will adopt those

 > I think you should consider to implement this for C1 and Interpreter 
since Tiered Compilation is on by default.

I can see C1 benefiting from this but (although never implemented or 
measured) but how the Interpreter can possibly benefit in any measurable 



On 29/01/2016 04:48, Igor Veresov wrote:
>> On Jan 28, 2016, at 12:41 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
>> x86.ad:
>> f
>> It seems that the comment here is off:
>> 1714     case Op_OnSpinWait:
>> 1715       if (UseSSE < 2) // requires at least SSE4
>> 1716         ret_value = false;
>> 1717       break;
>> Also we don’t support CPUs with SSE < 2, so you don’t have to make these changes to x86.ad. It’s enough that has_match_rule(), that is called by  Matcher::match_rule_supported(), will return true for Op_OnSpinWait.
>> x86_64.ad:
>> +instruct onspinwait()
>> +%{
>> +  match(OnSpinWait);
>> +  ins_cost(200);
>> ...
>> Is there any reason this can’t be moved to generic x86.ad ? It can be easily supported on 32bit as well, right (we do still support 32bit mode on linux)? The encoding is the same for both 32 and 64 bit modes, so that should be trivial.
>> library_call.cpp:
>> I think you forgot to actually call Matcher::match_rule_supported(). I think it should be something like:
>> bool LibraryCallKit::inline_onspinwait() {
>>   if (Matcher::match_rule_supported(Op_OnSpinWait) {
>>      insert_mem_bar(Op_OnSpinWait);
>>      return true;
>>   }
>>   return false;
>> }
> As Vladimir suggested, it’s better to check Matcher::match_rule_supported() in c2compiler.cpp in is_intrinsic_supported(). Sorry about the confusion. I stand by the other comments though.
> igor
>> igor
>>> On Jan 28, 2016, at 6:51 AM, Ivan Krylov <ivan at azulsystems.com> wrote:
>>> Hi Igor,
>>> Following Vladimir's suggestion I eliminated the UseOnSpinWaitIntrinsic flag altogether. I have adopted the Matcher::match_rule_supported() logic - seems to work on intel, but I don't have any non-intel box to test.
>>> Anyway, the new webrev:
>>> http://cr.openjdk.java.net/~ikrylov/8147844.hs.01/
>>> Igor, Vladimir, thanks,
>>> Ivan
>>> On 27/01/2016 22:03, Igor Veresov wrote:
>>>> Actually, I’d rather use Matcher::match_rule_supported() to test if it’s supported on the platform, rather than fixing all vm_version_*.* to check for the flag validity, that’s tedious (you forgot x86-32 and there’s going to be more platforms to fix for you sponsor). Something like UseOnSpinWaitIntrinsic && Matcher::match_rule_supported(Op_OnSpinWait) to decide whether or not to inline the intrinsic. Also, why are you not turning it on by default?
>>>> igor
>>>>> On Jan 27, 2016, at 4:48 AM, Ivan Krylov <ivan at azulsystems.com> wrote:
>>>>> Looks like there was some good discussion while I was peacefully sleeping.
>>>>> I don't have much to add. This patch was somewhat inspired by JEP-171 changes.
>>>>> Perhaps,there are other ways to achieve the same semantics.
>>>>> So, if we can consider this reviewed - I will wait for the actual JEP to become targeted to 9 and then seek a sponsor to do the push.
>>>>> Thanks,
>>>>> Ivan
>>>>> On 27/01/2016 09:12, Igor Veresov wrote:
>>>>>> I realize it’s not a big deal. I was just wondering if there was any specific reason control alone is not enough.
>>>>>> Anyways, looks ok for the first cut.
>>>>>> igor
>>>>>>> On Jan 26, 2016, at 9:24 PM, Gil Tene <gil at azul.com> wrote:
>>>>>>> Since a sensical loop that calls onSpinWait() would include at least a volatile load on every iteration (and possibly a volatile store), the new node does not create significant extra move restrictions that are not already there. Modeling this with a memory effect is one simple way to prevent it from being re-ordered out of the loop. There are probably other ways to achieve this, but this one doesn't really have a performance downside…
>>>>>>> — Gil.
>>>>>>>> On Jan 26, 2016, at 4:44 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
>>>>>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that’s intentional I wonder why is that?
>>>>>>>> igor
>>>>>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov <ivan at azulsystems.com> wrote:
>>>>>>>>> Hello,
>>>>>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed “spin wait hint”. The JEP is forming up nicely at  https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly.  The upcoming API changes can be seen at the webrev:
>>>>>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/
>>>>>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted.
>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/
>>>>>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86.  This intrinsic is guarded by the -XX:±UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() – effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the  'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs.
>>>>>>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified.  There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process.
>>>>>>>>> Thanks,
>>>>>>>>> Ivan
>>>>>>>>> [1]  - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops

More information about the hotspot-compiler-dev mailing list