review request: add intrinsics to use AES instructions

Deneau, Tom tom.deneau at
Mon Jul 16 16:03:56 PDT 2012

Vladimir --

OK I see now that the stubroutines_x86_xxx are bitness-dependent.
And are you saying that you would prefer that the intrinsics actually
be supported on 32-bit, not just that it builds and runs without support on 32-bit?

-- Tom

You can't execute 32bit VM with missing 32bit changes because flags are set but 
stubs are missing. And, yes, 32 bit VM is still used.


Deneau, Tom wrote:
> Vladimir --
> Right I didn't include 32-bit changes thinking that the majority
> of users of AES encryption/decryption would be 64-bit servers.
> But there is no technical reason why 32-bit couldn't be added.
> Do you feel 32-bit support is important?
> -- Tom
> Thank you, Tom
> I created next RFE and will sponsor changes. But I don't see 32 bit changes.
> 7184394: add intrinsics to use AES instructions
> Vladimir
>> Please review the following webrev which adds intrinsic support to
>> allow some of the com/sun/crypto/provider methods to use AES
>> instructions when a processor supports such instructions.
>> I do not have a bug number for this change but a description would be
>> something like the following:
>>    Modern x86 processors have AES instructions to accelerate AES
>>    encryption and decryption but Hotspot does not have a way to
>>    generate such instructions. There is a way to hook in a native
>>    crypto library using PKCS11 and there are a few native libraries
>>    that support hardware AES instructions. However, these native
>>    PKCS11 libraries
>>       * do not scale well with multiple threads
>>       * are not supported on all platforms, for instance Hotspot does
>>         not have PKCS11 support on 64-bit Windows.
>>       * can be confusing to configure. 
>> Since this webrev adds intrinsic support for the default
>> com/sun/crypto/provider classes, they are supported on all platforms
>> and there is no additional configuration required. Measurements have
>> shown that they scale very well will multiple threads.
>> The rest of this mail describes the scope of the intrinsics and
>> summarizes the source file changes.
>> -- Tom Deneau
>> Scope of the Intrinsics
>> -----------------------
>> When creating a cipher the application specifies a "transformation"
>> consisting of "algorithm/mode/padding". For more details see
>>    * These intrinsics kick in only when the algorithm part is "AES". A
>>      single block in AES is always 16 bytes and there are intrinsics
>>      for encrypting or decrypting a single block. These single-block
>>      intrinsics can work with any mode that uses AES and with any of
>>      the three AES key sizes (128, 192 or 256 bit).
>>    * A more optimized multi-block intrinsic can kick in if the
>>      algorithm/mode is "AES/CBC" (Cipher Block Chaining). Again all
>>      three AES key sizes are supported. There is no technical reason
>>      why we couldn't do multi-block intrinsics for the other modes
>>      (eg, ECB) but I want to get some feedback from the reviewers on
>>      the implementation before charging off on this path.
>>    * The padding part is handled by java routines outside of these
>>      intrinsics.
>> Summary of Changes
>> ------------------
>> src/cpu/x86/vm/assembler_x86.cpp, hpp
>>    Defined the aes instructions which are used by the stub routines.
>> src/cpu/x86/vm/stubGenerator_x86_64.cpp,
>>    Actual stub code for the aes intrinsics. As described earlier there
>>    are both single-block and multi-block intrinsic stubs.
>>    Note that the stubs make use of the "expanded key" which gets
>>    created each time the key changes. The expanded key is used by both
>>    the java code and the intrinsic AES instructions.
>>    The java code stores the "expanded key" in big-endian 32-bit
>>    integers. The x86 AES instructions require the expanded key to be
>>    in little-endian 128-bit words. Hence the pshufb instructions to
>>    get the key into the little-endian format
>> src/cpu/x86/vm/vm_version_x86.cpp, hpp
>>    Detect and store the aes capability bit in cpuid. A global boolean
>>    command line flag UseAES can be used to turn off AES even if the
>>    hardware supports it.
>> src/share/vm/classfile/vmSymbols.hpp
>> src/share/vm/opto/runtime.cpp, hpp
>>    The usual definitions of class names, method names and signatures
>>    for the java methods that are being intrinsified and the signatures
>>    for the stubs
>> src/share/vm/oops/methodOop.cpp
>>    Up until now, every intrinsic was replacing a routine that was
>>    loaded by the "default" (NULL) class  loader.
>>    com/sun/crypto/provider is not loaded by the default class
>>    loader so we had to add a check here.
>> src/share/vm/opto/escape.cpp
>>    escape analysis knows about certain stubs, but if it sees a leaf
>>    stub it also checks against a predefined list. So the new intrinsic
>>    names were added to the list.
>> src/share/vm/opto/library_call.cpp
>> src/share/vm/opto/callGenerator.cpp
>> src/share/vm/opto/doCall.cpp
>>    The main logic for building up the calls to the stubs at compile
>>    time, assuming the platform has a stub and the global flags have
>>    not turned these intrinsics off.
>>    A new helper routine to load a field from an object was added since
>>    we ended up loading fields in a few places.
>>    For best performance, we wanted to hook into the multi-block
>>    encrypt and decrypt methods such as in
>>    This code is not AES-specific but handles CBC mode for any
>>    algorithm. (The algorithm part is handled by the enclosed
>>    "embeddedCipher" object).
>>    Thus at runtime we want to do the equivalent of an instanceof check
>>    on embeddedCipher and either call the stub (if it is AESCrypt) or
>>    call the original java code (if it is some other algorithm
>>    type). For the CipherBlockChaining.decrypt there is a further
>>    runtime check that the source and destination are not the same
>>    array which, because of the way CBC works would require cloning the
>>    source (cipher).
>>    Vladimir added some infrastructure to generate predicated
>>    intrinsics to solve the above problem. A particular intrinsic need
>>    only specify that it is predicated, and generate the particular
>>    guard node which if false will take the Java path. This
>>    infrastructure can be used for future intrinsics that have to make
>>    such a runtime choice. These changes from Vladimir are in
>>    callGenerator.cpp, doCall.cpp, and a small bit in library_call.cpp.
>> src/share/vm/runtime/globals.hpp
>>    global flags were added to
>>       * turn off either AES encryption or AES decryption intrinsics separately
>>       * turn off the multi-block CBC/AES intrinsics. 
>>    By default all of the above are on. These are really there for
>>    testing, for example one could encrypt using Java and decrypt using
>>    the intrinsics.
>>    Also, a UseAES flag to ignore the hardware capability as described above.

