GetPrimitiveArrayCritical vs GetByteArrayRegion: 140x slow-down using -Xcheck:jni and java.util.zip.DeflaterOutputStream
paul.sandoz at oracle.com
Fri Mar 16 16:21:32 UTC 2018
Hi Ian, Thomas,
Some background on the bulk copying for byte buffers after talking with Mikael who worked on these changes a few years ago.
Please correct the following if needed as our memory is hazy :-)
IIRC at the time we felt this was a reasonable thing to do because C2 did not strip mine the loops for the bulk copying of large Java arrays i.e. the issue was there anyway for more common cases. However, i believe that may no longer be the so in some cases after Roland implemented loop strip mining in C2 . So we should go back and revisit/check the current support in buffers and Java arrays (System.arraycopy etc).
(This is also something we need to consider if we modify buffers to support capacities larger than Integer.MAX_VALUE. Also connects with Project Panama.)
If Thomas has not done so or does not plan to i can log an issue for you.
 https://bugs.openjdk.java.net/browse/JDK-8186027 <https://bugs.openjdk.java.net/browse/JDK-8186027>
> On Mar 15, 2018, at 10:49 AM, Ian Rogers <irogers at google.com> wrote:
> On Thu, Mar 15, 2018 at 1:25 AM Thomas Schatzl <thomas.schatzl at oracle.com>
>> On Thu, 2018-03-15 at 01:00 +0000, Ian Rogers wrote:
>>> An old data point on how large a critical region should be comes from
>>> java.nio.Bits. In JDK 9 the code migrated into unsafe, but in JDK 8
>>> the copies within a critical region were bound at most copying 1MB:
>>> native/java/nio/Bits.c#l88 This is inconsistent with Deflater and
>>> ObjectOutputStream which both allow unlimited arrays and thereby
>>> critical region sizes.
>>> In JDK 9 the copies starve the garbage collector in nio Bits too as
>>> there is no 1MB limit to the copy sizes:
>>> which came from:
>>> Perhaps this is a regression not demonstrated due to the testing
>>> It doesn't seem unreasonable to have the loops for the copies occur
>>> in 1MB chunks but JDK-8149596 lost this and so I'm confused on what
>>> the HotSpot stand point is.
>> Please file a bug (seems to be a core-libs/java.nio regression?),
>> preferably with some kind of regression test. Also file enhancements (I
>> would guess) for the other cases allowing unlimited arrays.
> I don't have perms to file bugs there's some catch-22 scenario in getting
> the permissions. Happy to have a bug filed or to file were that not an
> issue. Happy to create a test case but can't see any others for TTSP
> issues. This feels like a potential use case for jmh, perhaps run the
> benchmark well having a separate thread run GC bench.
> Should there be a bug to add, in debug mode, a TTSP watcher thread whose
> job it is to bring "random" threads into safepoints and report on tardy
> Should there be a bug to warn on being in a JNI critical for more than just
> a short period?
> Seems like there should be a bug on Unsafe.copyMemory and
> Unsafe.copySwapMemory having TTSP issues.
> Seems like there should be a bug on all uses of critical that don't chunk
> their critical region work based on some bound (like 1MB chunks for nio
> Bits)? How are these bounds set? A past reference that I've lost is in
> having the bound be the equivalent of 65535 bytecodes due to the
> expectation of GC work at least once in a method or on a loop backedge - I
> thought this was in a spec somewhere but now I can't find it. The bytecode
> size feels as arbitrary as 1MB, a time period would be better but that can
> depend on the kind of GC you want as delays with concurrent GC mean more
> than non-concurrent. Clearly the chunk size shouldn't just be 0, but this
> appears to currently be the norm in the JDK.
> The original reason for coming here was a 140x slow down in -Xcheck:jni in
> Deflater.deflate There are a few options there that its useful to enumerate:
> 1) rewrite in Java but there are correctness and open source related issues
> 2) remove underflow/overflow protection from critical arrays (revert
> or perhaps bound protection to arrays of a particular small size) - this
> removes checking and doesn't deal with TTSP
> 3) add a critical array slice API to JNI so that copies with -Xcheck:jni
> aren't unbounded (martinrb@ proposed this) - keeps checking but doesn't
> deal with TTSP
> 4) rewrite primitive array criticals with GetArrayRegion as O(n) beats the
> "silent killer" TTSP (effectively deprecate the critical APIs)
> In general (ie not just the deflate case) I think (1) is the most
> preferable. (2) and (3) both have TTSP issues. (4) isn't great performance
> wise, which motivates more use of approach (1), but I think deprecating
> criticals may just be the easiest and sanest way forward. I think that
> discussion is worth having on an e-mail thread rather than a bug.
>> Long TTSP is a performance bug as any other.
>>> In a way criticals are better than unsafe as they may
>>> pin the memory and not starve GC, which shenandoah does.
>> (Region based) Object pinning has its own share of problems:
>> - only (relatively) easily implemented in region based collectors
>> - may slow down pause a bit in presence of pinned regions/objects (for
>> non-concurrent copying collectors)
>> - excessive use of pinning may cause OOME and VM exit probably earlier
>> than the gc locker. GC locker seems to provide a more gradual
>> degradation. E.g. pinning regions typically makes these regions
>> unavailable for allocation.
>> I.e. you still should not use it for many, very long living objects.
>> Of course this somewhat depends on the sophistication of the
>> I think region based pinning would be a good addition to other
>> collectors than Shenandoah too. It has been on our minds for a long
>> time, but there are so many other more important issues :), so of
>> course we are eager to see contributions in this area. ;)
>> If you are interested on working on this, please ping us on hotspot-gc-
>> dev for implementation ideas to get you jump-started.
> I'd rather deprecate criticals than build upon the complexity, but I'm very
> glad this is a concern.
More information about the hotspot-dev