RFR JDK-8141491: Unaligned memory access in Bits.c

David Holmes david.holmes at oracle.com
Wed Feb 10 04:38:39 UTC 2016

On 10/02/2016 1:42 PM, Mikael Vidstedt wrote:
> Can I please get a quick review of these updated webrevs:

In terms of the incremental changes this looks fine. If you consider it 
all reviewed then nothing in the increments should change that.

But looking at the JDK code I have some follow up suggestions for 
- document all parameters with @param and describe constraints on 
- specify @throws for all IllegalArgumentException and 
NullPointerException conditions
- add a descriptive error message when throwing IllegalArgumentException
- not sure NullPointerException is correct, rather than 
IllegalArgumentException. for null base with zero offset cases


> hotspot:
> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.05/hotspot/webrev/
> jdk:
> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.05/jdk/webrev/
> incremental webrevs:
> hotspot:
> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.05.incr/hotspot/webrev/
> jdk:
> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.05.incr/jdk/webrev/
> Changes:
> * Added asserts in copy.cpp/conjoint_swap
> * Correctness: Moved offset sign checks to only be performed if
> corresponding base object is null, and added corresponding tests
> I'm about to make additional changes in this same area, so unless this
> last change is horribly broken I'm planning on pushing this and doing
> any additional cleanup in the upcoming change(s).
> Cheers,
> Mikael
> On 2016-02-07 16:14, David Holmes wrote:
>> On 6/02/2016 8:21 AM, Mikael Vidstedt wrote:
>>> I fully agree that moving the arguments checking up to Java makes more
>>> sense, and I've prepared new webrevs which do exactly that, including
>>> changes to address the other feedback from David, John and others:
>> Shouldn't the lowest-level do_conjoint_swap routines at least check
>> preconditions with asserts to catch the cases where the calling Java
>> code has failed to do the right thing? The other Copy methods seem to
>> do this.
>> David
>> -----
>>> hotspot:
>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04/hotspot/webrev/
>>> jdk:
>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04/jdk/webrev/
>>> Incremental webrevs for your convenience:
>>> hotspot:
>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04.incr/hotspot/webrev/
>>> jdk:
>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04.incr/jdk/webrev/
>>> I have done some benchmarking of this code and for large copies (16MB+)
>>> this outperforms the old Bits.c implementation by *30-100%* depending on
>>> platform and exact element sizes! For smaller copies the additional
>>> checks which are now performed hurt performance on client VMs (80-90% of
>>> old impl), but with the server VMs I see performance on par with, or in
>>> most cases 5-10% better than the old implementation. There's a
>>> potentially statistically significant regression of ~3-4% for
>>> elemSize=2, but for now I'm going to declare success. There's certainly
>>> room for further improvements here, but this should at least do for
>>> addressing the original problem.
>>> I filed https://bugs.openjdk.java.net/browse/JDK-8149159 for moving the
>>> checks for Unsafe.copyMemory to Java, and will work on that next. I also
>>> filed https://bugs.openjdk.java.net/browse/JDK-8149162 to cover the
>>> potential renaming of the Bits methods to have more informative names.
>>> Finally, I filed https://bugs.openjdk.java.net/browse/JDK-8149163 to
>>> look at improving the behavior of Unsafe.addressSize(), after having
>>> spent too much time trying to understand why the performance of the new
>>> U.copySwapMemory Java checks wasn't quite living up to my expectations
>>> (spoiler alert: Unsafe.addressSize() is not intrinsified, so will always
>>> result in a call into the VM/unsafe.cpp).
>>> Finally, I - too - would like to see the copy-swap logic moved into
>>> Java, and as I mentioned I played around with that first before I
>>> decided to do the native implementation to address the immediate
>>> problem. Looking forward to what you find Paul!
>>> Cheers,
>>> Mikael
>>> On 2016-02-05 05:00, Paul Sandoz wrote:
>>>> Hi,
>>>> Nice use of C++ templates :-)
>>>> Overall looks good.
>>>> I too would prefer if we could move the argument checking out, perhaps
>>>> even to the point of requiring callers do that rather than providing
>>>> another method, for example for Buffer i think the arguments are known
>>>> to be valid? I think in either case it is important to improve the
>>>> documentation on the method stating the constraints on arguments,
>>>> atomicity guarantees etc.
>>>> I have a hunch that for the particular case of copying-with-swap for
>>>> buffers i could get this to work work efficiently using Unsafe (three
>>>> separate methods for each unit type of 2, 4 and 8 bytes), since IIUC
>>>> the range is bounded to be less than Integer.MAX_VALUE so an int loop
>>>> rather than a long loop can be used and therefore safe points checks
>>>> will not be placed within the loop.
>>>> However, i think what you have done is more generally applicable and
>>>> could be made intrinsic. It would be a nice at some future point if it
>>>> could be made a pure Java implementation and intrinsified where
>>>> appropriate.
>>>> John, regarding array mismatch there were issues with the efficiency
>>>> of the unrolled loops with Unsafe access. (Since the loops were int
>>>> bases there were no issues with safe point checks.) Roland recently
>>>> fixed that so now code is generated that is competitive with direct
>>>> array accesses. We drop into the stub intrinsic and leverage 128bits
>>>> or 256bits where supported. Interestingly it seems the unrolled loop
>>>> using Unsafe is now slightly faster than the stub using 128bit
>>>> registers. I don’t know if that is due to unluckly alignment, and/or
>>>> the stub needs to do some manual unrolling. In terms of code-cache
>>>> efficiency the intrinsic is better.
>>>> Paul.
>>>>> On 4 Feb 2016, at 06:27, John Rose <john.r.rose at oracle.com> wrote:
>>>>> On Feb 2, 2016, at 11:25 AM, Mikael Vidstedt
>>>>> <mikael.vidstedt at oracle.com> wrote:
>>>>>> Please review this change which introduces a Copy::conjoint_swap and
>>>>>> an Unsafe.copySwapMemory method to call it from Java, along with the
>>>>>> necessary changes to have java.nio.Bits call it instead of the
>>>>>> Bits.c code.
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.03/hotspot/webrev/
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.03/jdk/webrev/
>>>>> This is very good.
>>>>> I have some nit-picks:
>>>>> These days, when we introduce a new intrinsic (@HSIntrCand),
>>>>> we write the argument checking code separately in a non-intrinsic
>>>>> bytecode method.  In this case, we don't (yet) have an intrinsic
>>>>> binding for U.copy*, but we might in the future.  (C intrinsifies
>>>>> memcpy, which is a precedent.)  In any case, I would prefer
>>>>> if we could structure the argument checking code in a similar
>>>>> way, with Unsafe.java containing both copySwapMemory
>>>>> and a private copySwapMemory0.  Then we can JIT-optimize
>>>>> the safety checks.
>>>>> You might as well extend the same treatment to the pre-existing
>>>>> copyMemory call.  The most important check (and the only one
>>>>> in U.copyMemory) is to ensure that the size_t operand has not
>>>>> wrapped around from a Java negative value to a crazy-large
>>>>> size_t value.  That's the low-hanging fruit.  Checking the pointers
>>>>> (for null or oob) is more problematic, of course.  Checking
>>>>> consistency
>>>>> around elemSize is cheap and easy, so I agree that the U.copySM
>>>>> should do that work also.  Basically, Unsafe can do very basic
>>>>> checks if there is a tricky user model to enforce, but it mustn't
>>>>> "sign up" to guard the user against all errors.
>>>>> Rule of thumb:  Unsafe calls don't throw NPEs, they just SEGV.
>>>>> And the rare bit that *does* throw (IAE usually) should be placed
>>>>> into Unsafe.java, not unsafe.cpp.  (The best-practice rule for putting
>>>>> argument checking code outside of the intrinsic is a newer one,
>>>>> so Unsafe code might not always do this.)
>>>>> The comment "Generalizing it would be reasonable, but requires
>>>>> card marking" is bogus, since we never byte-swap managed pointers.
>>>>> The test logic will flow a little smoother if your GenericPointer guy,
>>>>> the onHeap version, stores the appropriate array base offset in his
>>>>> offset field.
>>>>> You won't have to mention p.isOnHeap nearly so much, and the code will
>>>>> set a slightly better example.
>>>>> The VM_ENTRY_BASE_FROM_LEAF macro is really cool.
>>>>> The C++ template code is cool also.  It reminds me of the kind
>>>>> of work Gosling's "Ace" processor could do, but now it's mainstreamed
>>>>> for all to use in C++.  We're going to get some of that goodness
>>>>> in Project Valhalla with specialization logic.
>>>>> I find it amazing that the right way to code this in C is to
>>>>> use memcpy for unaligned accesses and byte peek/poke
>>>>> into registers for byte-swapping operators.  I'm glad we
>>>>> can write this code *once* for the JVM and JDK.
>>>>> Possible future work:  If we can get a better handle on
>>>>> writing vectorizable loops from Java, including Unsafe-based
>>>>> ones, we can move some of the C code back up to Java.
>>>>> Perhaps U.copy* calls for very short lengths deserved to
>>>>> be broken out into small loops of U.get/put* (with alignment).
>>>>> I think you experimented with this, and there were problems
>>>>> with the JIT putting fail-safe memory barriers between
>>>>> U.get/put* calls.  Paul's work on Array.mismatch ran into
>>>>> similar issues, with the right answer being to write manual
>>>>> vector code in assembly.
>>>>> Anyway, you can count me as a reviewer.
>>>>> Thanks,
>>>>> — John

More information about the core-libs-dev mailing list