RFR JDK-8141491: Unaligned memory access in Bits.c
paul.sandoz at oracle.com
Fri Feb 5 13:00:42 UTC 2016
Nice use of C++ templates :-)
Overall looks good.
I too would prefer if we could move the argument checking out, perhaps even to the point of requiring callers do that rather than providing another method, for example for Buffer i think the arguments are known to be valid? I think in either case it is important to improve the documentation on the method stating the constraints on arguments, atomicity guarantees etc.
I have a hunch that for the particular case of copying-with-swap for buffers i could get this to work work efficiently using Unsafe (three separate methods for each unit type of 2, 4 and 8 bytes), since IIUC the range is bounded to be less than Integer.MAX_VALUE so an int loop rather than a long loop can be used and therefore safe points checks will not be placed within the loop.
However, i think what you have done is more generally applicable and could be made intrinsic. It would be a nice at some future point if it could be made a pure Java implementation and intrinsified where appropriate.
John, regarding array mismatch there were issues with the efficiency of the unrolled loops with Unsafe access. (Since the loops were int bases there were no issues with safe point checks.) Roland recently fixed that so now code is generated that is competitive with direct array accesses. We drop into the stub intrinsic and leverage 128bits or 256bits where supported. Interestingly it seems the unrolled loop using Unsafe is now slightly faster than the stub using 128bit registers. I don’t know if that is due to unluckly alignment, and/or the stub needs to do some manual unrolling. In terms of code-cache efficiency the intrinsic is better.
> On 4 Feb 2016, at 06:27, John Rose <john.r.rose at oracle.com> wrote:
> On Feb 2, 2016, at 11:25 AM, Mikael Vidstedt <mikael.vidstedt at oracle.com> wrote:
>> Please review this change which introduces a Copy::conjoint_swap and an Unsafe.copySwapMemory method to call it from Java, along with the necessary changes to have java.nio.Bits call it instead of the Bits.c code.
> This is very good.
> I have some nit-picks:
> These days, when we introduce a new intrinsic (@HSIntrCand),
> we write the argument checking code separately in a non-intrinsic
> bytecode method. In this case, we don't (yet) have an intrinsic
> binding for U.copy*, but we might in the future. (C intrinsifies
> memcpy, which is a precedent.) In any case, I would prefer
> if we could structure the argument checking code in a similar
> way, with Unsafe.java containing both copySwapMemory
> and a private copySwapMemory0. Then we can JIT-optimize
> the safety checks.
> You might as well extend the same treatment to the pre-existing
> copyMemory call. The most important check (and the only one
> in U.copyMemory) is to ensure that the size_t operand has not
> wrapped around from a Java negative value to a crazy-large
> size_t value. That's the low-hanging fruit. Checking the pointers
> (for null or oob) is more problematic, of course. Checking consistency
> around elemSize is cheap and easy, so I agree that the U.copySM
> should do that work also. Basically, Unsafe can do very basic
> checks if there is a tricky user model to enforce, but it mustn't
> "sign up" to guard the user against all errors.
> Rule of thumb: Unsafe calls don't throw NPEs, they just SEGV.
> And the rare bit that *does* throw (IAE usually) should be placed
> into Unsafe.java, not unsafe.cpp. (The best-practice rule for putting
> argument checking code outside of the intrinsic is a newer one,
> so Unsafe code might not always do this.)
> The comment "Generalizing it would be reasonable, but requires
> card marking" is bogus, since we never byte-swap managed pointers.
> The test logic will flow a little smoother if your GenericPointer guy,
> the onHeap version, stores the appropriate array base offset in his offset field.
> You won't have to mention p.isOnHeap nearly so much, and the code will
> set a slightly better example.
> The VM_ENTRY_BASE_FROM_LEAF macro is really cool.
> The C++ template code is cool also. It reminds me of the kind
> of work Gosling's "Ace" processor could do, but now it's mainstreamed
> for all to use in C++. We're going to get some of that goodness
> in Project Valhalla with specialization logic.
> I find it amazing that the right way to code this in C is to
> use memcpy for unaligned accesses and byte peek/poke
> into registers for byte-swapping operators. I'm glad we
> can write this code *once* for the JVM and JDK.
> Possible future work: If we can get a better handle on
> writing vectorizable loops from Java, including Unsafe-based
> ones, we can move some of the C code back up to Java.
> Perhaps U.copy* calls for very short lengths deserved to
> be broken out into small loops of U.get/put* (with alignment).
> I think you experimented with this, and there were problems
> with the JIT putting fail-safe memory barriers between
> U.get/put* calls. Paul's work on Array.mismatch ran into
> similar issues, with the right answer being to write manual
> vector code in assembly.
> Anyway, you can count me as a reviewer.
> — John
More information about the core-libs-dev