Optimizing arithmetic operations on processors with AVX2 support

Paul Hohensee paul.hohensee at oracle.com
Thu Dec 8 10:59:55 PST 2011

cc'ing hotspot-dev.


On 12/8/11 1:41 PM, John Platts wrote:
> I actually do agree with what Shankar said regarding the current Java memory model. However, the point that I was trying to illustrate is that the JIT compiler of a JVM can use the AVX2, SSE2, SSE3, and SSE4.1 instructions instead of the ordinary arithmetic instructions to optimize the performance of integer arithmetic operations on x86 processors with the AVX2 instruction set. I was also intending to show an example of how arithmetic operations can be re-ordered by a JVM implementation without violating the Java Memory Model.
> ----------------------------------------
>> Date: Thu, 8 Dec 2011 10:08:12 -0800
>> From: shankar at vmware.com
>> To: john_platts at hotmail.com
>> CC: jdk8-dev at openjdk.java.net
>> Subject: Re: Optimizing arithmetic operations on processors with AVX2 support
>> Such an assumption (in-order execution of statements) would be invalid even with the current memory model. There's nothing to stop the compilers from re-ordering the adds and multiplies so that they fill each other's pipeline delays.
>> So I don't think AVX2 brings anything new to the table in terms of perturbing the memory model.
>> ----- Original Message -----
>> From: "John Platts"<john_platts at hotmail.com>
>> To: jdk8-dev at openjdk.java.net
>> Sent: Thursday, December 8, 2011 9:25:41 AM
>> Subject: Optimizing arithmetic operations on processors with AVX2 support
>> Here is an example of a class with an operation that can be optimized on a processor with AVX2 support:class ExampleClass {    public void ExampleOperation(ExampleClass y) {        a += y.a;        b *= y.b;        c += y.c;        d += y.d;        e += y.e;        f *= y.f;        g *= y.g;        h *= y.h;    }
>>      private int a;    private int b;    private int c;    private int d;    private int e;    private int f;    private int g;    private int h;}
>> The AVX2 instruction set includes gather instructions that can be used to read from primitive fields that are not contiguous to each other. The AVX2 instruction set will be implemented on the Intel Haswell microarchitecture processors.
>> In the example above, a JVM running on a processor with the AVX2 instruction set can optimize the ExampleOperation method as follows:- Reading the a, c, d, and e fields of both this and y using the VPGATHERDD instruction.- Performing the 4 addition operations simultaneously using the PADDD instruction.- Store the result of the addition operations in a, c, d, and e using the PEXTRD instruction.- Reading the b, f, g, and h fields of both this and y using the VPGATHERDD instruction.- Performing the 4 multiplication operations simultaneously using the PMULLD instruction.- Store the result of the multiplication operations in b, f, g, and h using the PEXTRD instruction.
>> This optimization is perfectly legal under the Java Memory Model, since there are no volatile reads or volatile writes. However, this optimization would be illegal if a, b, c, d, e, f, g, or h were declared as volatile fields. This optimization must also respect constraints imposed by synchronized blocks, volatile reads, volatile writes, method calls, data dependencies, and strictfp semantics. This optimization would also need to be disabled if the method is being debugged by a Java debugger, as the Java debugger can step through each operation individually.
>> The point I am trying to illustrate is that Java programmers should not assume that the arithmetic operations performed by the ExampleOperation method are not guaranteed to execute in the sequence shown in the source code. This example also illustrates the importance of properly synchronization. Will this optimization get implemented in the Hotspot VM in the future?

More information about the hotspot-dev mailing list