review request: add intrinsics to use AES instructions

Deneau, Tom tom.deneau at
Mon Jul 16 16:03:56 PDT 2012

Vladimir --

OK I see now that the stubroutines_x86_xxx are bitness-dependent.
And are you saying that you would prefer that the intrinsics actually
be supported on 32-bit, not just that it builds and runs without support on 32-bit?

-- Tom

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at] 
Sent: Monday, July 16, 2012 2:08 PM
To: Deneau, Tom
Cc: hotspot-compiler-dev at
Subject: Re: review request: add intrinsics to use AES instructions

You can't execute 32bit VM with missing 32bit changes because flags are set but 
stubs are missing. And, yes, 32 bit VM is still used.


Deneau, Tom wrote:
> Vladimir --
> Right I didn't include 32-bit changes thinking that the majority
> of users of AES encryption/decryption would be 64-bit servers.
> But there is no technical reason why 32-bit couldn't be added.
> Do you feel 32-bit support is important?
> -- Tom
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at] 
> Sent: Monday, July 16, 2012 12:40 PM
> To: Deneau, Tom
> Cc: hotspot-compiler-dev at
> Subject: Re: review request: add intrinsics to use AES instructions
> Thank you, Tom
> I created next RFE and will sponsor changes. But I don't see 32 bit changes.
> 7184394: add intrinsics to use AES instructions
> Vladimir
> Deneau, Tom wrote:
>> Please review the following webrev which adds intrinsic support to
>> allow some of the com/sun/crypto/provider methods to use AES
>> instructions when a processor supports such instructions.
>> I do not have a bug number for this change but a description would be
>> something like the following:
>>    Modern x86 processors have AES instructions to accelerate AES
>>    encryption and decryption but Hotspot does not have a way to
>>    generate such instructions. There is a way to hook in a native
>>    crypto library using PKCS11 and there are a few native libraries
>>    that support hardware AES instructions. However, these native
>>    PKCS11 libraries
>>       * do not scale well with multiple threads
>>       * are not supported on all platforms, for instance Hotspot does
>>         not have PKCS11 support on 64-bit Windows.
>>       * can be confusing to configure. 
>> Since this webrev adds intrinsic support for the default
>> com/sun/crypto/provider classes, they are supported on all platforms
>> and there is no additional configuration required. Measurements have
>> shown that they scale very well will multiple threads.
>> The rest of this mail describes the scope of the intrinsics and
>> summarizes the source file changes.
>> -- Tom Deneau
>> Scope of the Intrinsics
>> -----------------------
>> When creating a cipher the application specifies a "transformation"
>> consisting of "algorithm/mode/padding". For more details see
>>    * These intrinsics kick in only when the algorithm part is "AES". A
>>      single block in AES is always 16 bytes and there are intrinsics
>>      for encrypting or decrypting a single block. These single-block
>>      intrinsics can work with any mode that uses AES and with any of
>>      the three AES key sizes (128, 192 or 256 bit).
>>    * A more optimized multi-block intrinsic can kick in if the
>>      algorithm/mode is "AES/CBC" (Cipher Block Chaining). Again all
>>      three AES key sizes are supported. There is no technical reason
>>      why we couldn't do multi-block intrinsics for the other modes
>>      (eg, ECB) but I want to get some feedback from the reviewers on
>>      the implementation before charging off on this path.
>>    * The padding part is handled by java routines outside of these
>>      intrinsics.
>> Summary of Changes
>> ------------------
>> src/cpu/x86/vm/assembler_x86.cpp, hpp
>>    Defined the aes instructions which are used by the stub routines.
>> src/cpu/x86/vm/stubGenerator_x86_64.cpp,
>>    Actual stub code for the aes intrinsics. As described earlier there
>>    are both single-block and multi-block intrinsic stubs.
>>    Note that the stubs make use of the "expanded key" which gets
>>    created each time the key changes. The expanded key is used by both
>>    the java code and the intrinsic AES instructions.
>>    The java code stores the "expanded key" in big-endian 32-bit
>>    integers. The x86 AES instructions require the expanded key to be
>>    in little-endian 128-bit words. Hence the pshufb instructions to
>>    get the key into the little-endian format
>> src/cpu/x86/vm/vm_version_x86.cpp, hpp
>>    Detect and store the aes capability bit in cpuid. A global boolean
>>    command line flag UseAES can be used to turn off AES even if the
>>    hardware supports it.
>> src/share/vm/classfile/vmSymbols.hpp
>> src/share/vm/opto/runtime.cpp, hpp
>>    The usual definitions of class names, method names and signatures
>>    for the java methods that are being intrinsified and the signatures
>>    for the stubs
>> src/share/vm/oops/methodOop.cpp
>>    Up until now, every intrinsic was replacing a routine that was
>>    loaded by the "default" (NULL) class  loader.
>>    com/sun/crypto/provider is not loaded by the default class
>>    loader so we had to add a check here.
>> src/share/vm/opto/escape.cpp
>>    escape analysis knows about certain stubs, but if it sees a leaf
>>    stub it also checks against a predefined list. So the new intrinsic
>>    names were added to the list.
>> src/share/vm/opto/library_call.cpp
>> src/share/vm/opto/callGenerator.cpp
>> src/share/vm/opto/doCall.cpp
>>    The main logic for building up the calls to the stubs at compile
>>    time, assuming the platform has a stub and the global flags have
>>    not turned these intrinsics off.
>>    A new helper routine to load a field from an object was added since
>>    we ended up loading fields in a few places.
>>    For best performance, we wanted to hook into the multi-block
>>    encrypt and decrypt methods such as in
>>    This code is not AES-specific but handles CBC mode for any
>>    algorithm. (The algorithm part is handled by the enclosed
>>    "embeddedCipher" object).
>>    Thus at runtime we want to do the equivalent of an instanceof check
>>    on embeddedCipher and either call the stub (if it is AESCrypt) or
>>    call the original java code (if it is some other algorithm
>>    type). For the CipherBlockChaining.decrypt there is a further
>>    runtime check that the source and destination are not the same
>>    array which, because of the way CBC works would require cloning the
>>    source (cipher).
>>    Vladimir added some infrastructure to generate predicated
>>    intrinsics to solve the above problem. A particular intrinsic need
>>    only specify that it is predicated, and generate the particular
>>    guard node which if false will take the Java path. This
>>    infrastructure can be used for future intrinsics that have to make
>>    such a runtime choice. These changes from Vladimir are in
>>    callGenerator.cpp, doCall.cpp, and a small bit in library_call.cpp.
>> src/share/vm/runtime/globals.hpp
>>    global flags were added to
>>       * turn off either AES encryption or AES decryption intrinsics separately
>>       * turn off the multi-block CBC/AES intrinsics. 
>>    By default all of the above are on. These are really there for
>>    testing, for example one could encrypt using Java and decrypt using
>>    the intrinsics.
>>    Also, a UseAES flag to ignore the hardware capability as described above.

More information about the hotspot-compiler-dev mailing list