RFR(M): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding

Corey Ashford cjashfor at linux.ibm.com
Tue Aug 25 01:21:59 UTC 2020

Here's a revised webrev which includes a JMH benchmark for the decode 


The added benchmark tries to be "fair" in that it doesn't prefer a large 
buffer size, which would favor the intrinsic.  It pseudo-randomly (but 
reproducibly) chooses a buffer size between 8 and 20k+8 bytes, and fills 
it with random data to encode and decode.  As part of the TearDown of an 
invocation, it also checks the decoded output data for correctness.

Example runs on the Power9-based machine I use for development shows a 
3X average improvement across these random buffer sizes.  Here's an 
excerpt of the output when run with -XX:-UseBASE64Intrinsics :

Iteration   1: 70795.623 ops/s
Iteration   2: 71070.607 ops/s
Iteration   3: 70867.544 ops/s
Iteration   4: 71107.992 ops/s
Iteration   5: 71048.281 ops/s

And here's the output with the intrinsic enabled:

Iteration   1: 208794.022 ops/s
Iteration   2: 208630.904 ops/s
Iteration   3: 208238.822 ops/s
Iteration   4: 208714.967 ops/s
Iteration   5: 209060.894 ops/s

Taking the best of the two runs: 209060/71048 = 2.94

 From other experiments where the benchmark uses a fixed-size, larger 
buffer, the performance ratio rises to about 4.0.

Power10 should have a slightly higher ratio due to several factors, but 
I have not yet benchmarked on Power10.

Other arches ought to be able to do at least this well, if not better, 
because of wider vector registers (> 128 bits) being available.  Only a 
Power9/10 implementation is included in this webrev, however.


- Corey

On 8/19/20 11:20 AM, Roger Riggs wrote:
> Hi Corey,
> For changes obviously performance motivated, it is conventional to run a 
> JMH perf test to demonstate
> the improvement and prove it is worthwhile to add code complexity.
> I don't see any existing Base64 JMH tests but they would be in the repo 
> below or near:
>      test/micro/org/openjdk/bench/java/util/
> Please contribute a JMH test and results to show the difference.
> Regards, Roger
> On 8/19/20 2:10 PM, Corey Ashford wrote:
>> Michihiro Horie posted up a new iteration of this webrev for me.  This 
>> time the webrev includes a complete implementation of the intrinsic 
>> for Power9 and Power10.
>> You can find it here: 
>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.02/
>> Changes in webrev.02 vs. webrev.01:
>>   * The method header for the intrinsic in the Base64 code has been 
>> rewritten using the Javadoc style.  The clarity of the comments has 
>> been improved and some verbosity has been removed. There are no 
>> additional functional changes to Base64.java.
>>   * The code needed to martial and check the intrinsic parameters has 
>> been added, using the base64 encodeBlock intrinsic as a guideline.
>>   * A complete intrinsic implementation for Power9 and Power10 is 
>> included.
>>   * Adds some Power9 and Power10 assembler instructions needed by the 
>> intrinsic which hadn't been defined before.
>> The intrinsic implementation in this patch accelerates the decoding of 
>> large blocks of base64 data by a factor of about 3.5X on Power9.
>> I'm attaching two Java test cases I am using for testing and 
>> benchmarking.  The TestBase64_VB encodes and decodes randomly-sized 
>> buffers of random data and checks that original data matches the 
>> encoded-then-decoded data.  TestBase64Errors encodes a 48K block of 
>> random bytes, then corrupts each byte of the encoded data, one at a 
>> time, checking to see if the decoder catches the illegal byte.
>> Any comments/suggestions would be appreciated.
>> Thanks,
>> - Corey
>> On 7/27/20 6:49 PM, Corey Ashford wrote:
>>> Michihiro Horie uploaded a new revision of the Base64 decodeBlock 
>>> intrinsic API for me:
>>> http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/
>>> It has the following changes with respect to the original one posted:
>>>   * In the event of encountering a non-base64 character, instead of 
>>> having a separate error code of -1, the intrinsic can now just return 
>>> either 0, or the number of data bytes produced up to the point where 
>>> the illegal base64 character was encountered. This reduces the number 
>>> of special cases, and also provides a way to speed up the process of 
>>> finding the bad character by the slower, pure-Java algorithm.
>>>   * The isMIME boolean is removed from the API for two reasons:
>>>     - The current API is not sufficient to handle the isMIME case, 
>>> because there isn't a strict relationship between the number of input 
>>> bytes and the number of output bytes, because there can be an 
>>> arbitrary number of non-base64 characters in the source.
>>>     - If an intrinsic only implements the (isMIME == false) case as 
>>> ours does, it will always return 0 bytes processed, which will 
>>> slightly slow down the normal path of processing an (isMIME == true) 
>>> instantiation.
>>>     - We considered adding a separate hotspot candidate for the 
>>> (isMIME == true) case, but since we don't have an intrinsic 
>>> implementation to test that, we decided to leave it as a future 
>>> optimization.
>>> Comments and suggestions are welcome.  Thanks for your consideration.
>>> - Corey
>>> On 6/23/20 6:23 PM, Michihiro Horie wrote:
>>>> Hi Corey,
>>>> Following is the issue I created.
>>>> https://bugs.openjdk.java.net/browse/JDK-8248188
>>>> I will upload a webrev when you're ready as we talked in private.
>>>> Best regards,
>>>> Michihiro
>>>> Inactive hide details for "Corey Ashford" ---2020/06/24 
>>>> 09:40:10---Currently in java.util.Base64, there is a 
>>>> HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently 
>>>> in java.util.Base64, there is a HotSpotIntrinsicCandidate and API 
>>>> for encodeBlock, but no
>>>> From: "Corey Ashford" <cjashfor at linux.ibm.com>
>>>> To: "hotspot-compiler-dev at openjdk.java.net" 
>>>> <hotspot-compiler-dev at openjdk.java.net>, 
>>>> "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>
>>>> Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, 
>>>> joserz at br.ibm.com
>>>> Date: 2020/06/24 09:40
>>>> Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for 
>>>> Base64 decoding
>>>> ------------------------------------------------------------------------ 
>>>> Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and
>>>> API for encodeBlock, but none for decoding.  This means that only
>>>> encoding gets acceleration from the underlying CPU's vector hardware.
>>>> I'd like to propose adding a new intrinsic for decodeBlock.  The
>>>> considerations I have for this new intrinsic's API:
>>>>   * Don't make any assumptions about the underlying capability of the
>>>> hardware.  For example, do not impose any specific block size 
>>>> granularity.
>>>>   * Don't assume the underlying intrinsic can handle isMIME or isURL
>>>> modes, but also let them decide if they will process the data 
>>>> regardless
>>>> of the settings of the two booleans.
>>>>   * Any remaining data that is not processed by the intrinsic will be
>>>> processed by the pure Java implementation.  This allows the 
>>>> intrinsic to
>>>> process whatever block sizes it's good at without the complexity of
>>>> handling the end fragments.
>>>>   * If any illegal character is discovered in the decoding process, the
>>>> intrinsic will simply return -1, instead of requiring it to throw a
>>>> proper exception from the context of the intrinsic.  In the event of
>>>> getting a -1 returned from the intrinsic, the Java Base64 library code
>>>> simply calls the pure Java implementation to have it find the error and
>>>> properly throw an exception.  This is a performance trade-off in the
>>>> case of an error (which I expect to be very rare).
>>>>   * One thought I have for a further optimization (not implemented in
>>>> the current patch), is that when the intrinsic decides not to process a
>>>> block because of some combination of isURL and isMIME settings it
>>>> doesn't handle, it could return extra bits in the return code, encoded
>>>> as a negative number.  For example:
>>>> Illegal_Base64_char   = 0b001;
>>>> isMIME_unsupported    = 0b010;
>>>> isURL_unsupported     = 0b100;
>>>> These can be OR'd together as needed and then negated (flip the sign).
>>>> The Base64 library code could then cache these flags, so it will know
>>>> not to call the intrinsic again when another decodeBlock is requested
>>>> but with an unsupported mode.  This will save the performance hit of
>>>> calling the intrinsic when it is guaranteed to fail.
>>>> I've tested the attached patch with an actual intrinsic coded up for
>>>> Power9/Power10, but those runtime intrinsics and arch-specific patches
>>>> aren't attached today.  I want to get some consensus on the
>>>> library-level intrinsic API first.
>>>> Also attached is a simple test case to test that the new intrinsic API
>>>> doesn't break anything.
>>>> I'm open to any comments about this.
>>>> Thanks for your consideration,
>>>> - Corey
>>>> Corey Ashford
>>>> IBM Systems, Linux Technology Center, OpenJDK team
>>>> cjashfor at us dot ibm dot com
>>>> [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro 
>>>> Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro 
>>>> Horie/Japan/IBM]

More information about the core-libs-dev mailing list