RFR 8134802 - LCM register pressure scheduling

Vladimir Kozlov vladimir.kozlov at oracle.com
Sat Sep 12 03:58:23 UTC 2015

Looks good.

I looked on performance data and for scimark.lu.large C2 time increase 
significantly (~ 39%) while score did not improve (0,18%).
I can accept compilation time regression if it gives performance 
improvement as crypto.aes. But otherwise we need to investigate why that 

Can you rerun this on sub-benchmark to see if it repeated?

Also, please, do performance run for nashorn as Aleksey suggested.

RA code at the beginning of gcm.cpp is not guarded by OptoRegScheduling. 
I think you can put guard around all that new code including:
_regalloc = ®alloc;

Also JPRT reported build failures:

hotspot/src/share/vm/opto/lcm.cpp:999:9: error: 'UseAVX' was not 
declared in this scope

      if (UseAVX > 2) {
        float_pressure *= 2;

UseAVX is x86 platform-specific.  Why you need to increase 
float_pressure? If you really need it you can check:

  if (Matcher::max_vector_size(T_DOUBLE) > 4)


On 9/11/15 10:43 AM, Berg, Michael C wrote:
> Vladimir, please see the latest update at:
> http://cr.openjdk.java.net/~mcberg/8134802/webrev.02/
> I have made the node change from below to share flag definitions (reduction/scheduling).
> I also added code to screen out methods with only small blocks for live range analysis and register pressure scheduling.
> For methods which have some larger blocks we now screen out the small blocks as well.  Meaning, overhead
> Is by and large not an issue as I see x64 and x86 C2 time not affected by my algorithm with any scheduling budget being offset by time not spent register allocation.
> Thanks,
> Michael
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Thursday, September 10, 2015 6:04 PM
> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: RFR 8134802 - LCM register pressure scheduling
> On 9/10/15 12:11 PM, Berg, Michael C wrote:
>> Ok, I can make is_reduction and is_scheduled have the same value.  Since I'm clearing it during init processing that will work quite well.  Nobody downstream processes reductions.
>> Problem:
>> The C++ standard implements enum as int sized, we should union _flags with NodeFlags and increase NodeFlags to juint. We would actually decrease the amount of storage in node by doing so since right now storage for NodeFlags is additive with _flags.  We would get 16 more flag slots and make node smaller.
> NodeFlags is type, there is no a field in Node class with NodeFlags type.  NodeFlags is only used to define flags values which are used to set bits in _flags. So I am not sure what you are proposing.
> Thanks,
> Vladimir
>> Michael
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, September 09, 2015 8:29 PM
>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>> We only have 3 bits left since total is 16:
>> jushort _flags;
>> You have Flag_is_reduction which is used only in loop opts/superword. So you can overlap these flags.
>> We need to clean up this (no you, Michael). We have flags which are used only by Ideal node (Flag_is_macro, Flag_is_expensive). And flags used by Mach nodes (5 flags). We may try to overlap them.
>> Vladimir
>> On 9/9/15 7:34 PM, Berg, Michael C wrote:
>>> All, please see the link:
>>> https://bugs.openjdk.java.net/browse/JDK-8134802
>>> As I have uploaded a performance report for data collected with/wo register pressure scheduling. I would like to keep the node flag in place, we have room for 15 more flags after this one is added, and this is a formal phase of C2 and so a good use of one the flags.  The addition of VectorSet would incrementally raise the overhead of the algorithm. Please have a look and comment as needed.
>>> Thanks,
>>> Michael
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Friday, September 04, 2015 6:42 PM
>>> To: Berg, Michael C; hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: RFR 8134802 - LCM register pressure scheduling
>>> Impressive work. Thank you for reusing current RA functionality.
>>> "is very minimal" - how minimal? 2% or 10%?
>>> Did it gave any performance improvement? Changes are significant and should be justified.
>>> Changes look reasonable. I only notice one thing:
>>> Flag bits in Node is very precious to use for node's state tracking. Why not use VectorSet?
>>> Thanks,
>>> Vladimir
>>> On 9/4/15 1:33 PM, Berg, Michael C wrote:
>>>> Hi Folks,
>>>> I would like to contribute LCM register pressure scheduling. I need
>>>> two reviewers to examine this patch and comment as needed:
>>>> Bug-id: https://bugs.openjdk.java.net/browse/JDK-8134802
>>>> webrev:
>>>> http://cr.openjdk.java.net/~mcberg/8134802/webrev.01/
>>>> These changes calculate register pressure at the entry of a basic
>>>> block, at the end and incrementally while we are scheduling. It uses
>>>> an efficient algorithm for recalculating register pressure on a as
>>>> needed basis. The algorithm uses heuristics to switch to a pressure
>>>> based algorithm to reduce spills for int and float registers using
>>>> thresholds for each. It also uses weights which count on a per
>>>> register class basis to dope ready list candidate choice while
>>>> scheduling so that we reduce register pressure when possible. Once
>>>> we fall over either threshold, we start trying mitigate pressure
>>>> upon the affected class of registers which are over the limit. This
>>>> happens on both register classes and/or separately for each. We
>>>> switch back to latency scheduling when pressure is alleviated. As
>>>> before we obey hard artifacts such as barriers, fences and such.
>>>> Overhead for constructing and providing liveness information and the
>>>> additional algorithmic usage is very minimal, so as affect compile time minimally.
>>>> Thanks,
>>>> Michael

More information about the hotspot-compiler-dev mailing list