RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller size on SPARCv9 and X64
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri Jan 17 18:06:24 UTC 2020
Thanks for chiming in on this review thread.
For the purposes of this fix (8235931), I'm going to stick with
OM_CACHE_LINE_SIZE and we'll have to hash out a broader definition
of "cache line size" via a different bug ID.
I hope you don't mind that I have no interest in taking on the broader
naming difficulties using 8235931...
On 12/20/19 9:24 AM, Doerr, Martin wrote:
> Just a comment on
>> CACHE_LINE_SIZE 64
>> DOUBLE_CACHE_LINE_SIZE 128
> I think CACHE_LINE_SIZE is too unspecific.
> There are L1I L1D, L2, L3, ... caches and they can have different cache line sizes.
> In addition, they can change between processor models.
> So we should have a better name than just "CACHE_LINE_SIZE".
> PPC and s390 already have large cache lines (which are real, not faked for performance considerations).
> I guess we don't want to use even more padding on these platforms.
> So we will probably want to set both constants to the same value (needs to get evaluated).
> Best regards,
>> -----Original Message-----
>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>> bounces at openjdk.java.net> On Behalf Of Daniel D. Daugherty
>> Sent: Donnerstag, 19. Dezember 2019 18:51
>> To: Claes Redestad <claes.redestad at oracle.com>; David Holmes
>> <david.holmes at oracle.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(XXS): 8235931: add OM_CACHE_LINE_SIZE and use smaller
>> size on SPARCv9 and X64
>> Hi David and Claes,
>> Thanks for the fast reviews! Sorry for my delay in responding. I've been
>> working on the Async Monitor Deflation wiki updates...
>> Since Claes was kind enough to reply to David's post and included all
>> of David's comments, I'm going to reply to both here.
>> On 12/18/19 10:03 AM, Claes Redestad wrote:
>>> On 2019-12-18 06:50, David Holmes wrote:
>>>> Hi Dan,
>>>> On 18/12/2019 7:34 am, Daniel D. Daugherty wrote:
>>>>> I'm extracting another standalone fix from the Async Monitor Deflation
>>>>> project (JDK-8153224) and sending it out for review (and testing)
>>>>> separately. This is an XXS sized fix, but since it involves cache
>>>>> line sizes I don't think a trivial review is appropriate.
>> First, David's part:
>>>> Can you elaborate on why a smaller cache-line-size assumption is
>>>> beneficial here?
>> I originally heard about switching from 128 byte to 64 byte cache-line-size
>> at the June 2019 Runtime offsite. I don't remember who brought it up, but
>> they were saying that Intel no longer recommends using 128 byte
>> (and hasn't recommended that for some time). IIRC, the person also said
>> their Intel contact was a bit surprised about the 128 byte cache-line-size
>> When I talked about Async Monitor Deflation in June I mentioned that I
>> was going to try out 64 byte cache-line-size for ObjectMonitors and so
>> far I haven't seen any performance issues.
>>>> Do you have performance numbers for the bug report?
>> I've been doing Aurora SPECjbb2015 runs (tuned G1, 10 iterations) between
>> my baseline (jdk-14+26) and the various fixes. For this fix (8235931)
>> relative to my first baseline run:
>> criticalJOPS -0.40% (Non-significant)
>> maxJOPS +0.23% (Non-significant)
>> I reran the baseline a second time using the same bits and for this fix
>> (8236035) relative to the second run:
>> criticalJOPS +6.96% (Non-significant)
>> maxJOPS +9.96% (Non-significant)
>> When I compare the two baseline runs:
>> criticalJOPS -6.65% (Non-significant)
>> maxJOPS -8.86% (Non-significant)
>> So even with 10 iterations, these SPECjbb2015 runs are all over the
>> place. And Aurora is agreeing with that by flagging the results as
>> "Non-significant" because the 'p' values are too large.
>> I'm doing another set of Aurora SPECjbb2015 runs and I'm bumping the
>> iterations from 10 to 15. In the recent past, Per told me I might
>> have to do 20 runs of SPECjbb2015 to shake out the variations.
>> We'll see what turns up with this latest set of runs...
>> Second, Claes' part:
>>> I did numerous experiments on this during work on JEP-143 and saw no
>>> improvement on neither "real" benchmarks nor stress tests that try to
>>> explicitly provoke cross-monitor false sharing on various fields in
>>> ObjectMonitors. I've not re-run those experiments on new hardware, so
>>> YMMV. Should probably check if they're still usable..
>> As always, I'm interested in your ObjectMonitor results.
>>> Specifically, for padding values below 64 we could provoke false sharing
>>> effects on stress tests, with negligible movement on "real" benchmarks.
>>> For values above 64 we just waste space.
>> Yup. You've said that a few times and that's one of the reasons that
>> I've included this change in the Async Monitor Deflation project.
>>> There exists a legend that you need double the physical cache line
>>> padding to not provoke the wrath of false sharing in the presence of
>>> aggressive prefetching, which is the reason why the
>>> "DEFAULT_CACHE_LINE_SIZE" is set to twice the typical physical cache
>>> line size in bytes. If we explicitly think we need to pad two lines,
>>> then the constant DEFAULT_CACHE_LINE_SIZE should be named
>> I wouldn't call it "a legend". When we made this change years ago, Dice
>> had very specific advice from someone at Intel to make this change. He
>> could probably still dig up the details...
>>> The patch in question looks good to me,
>>> although I'd prefer something like:
>>> CACHE_LINE_SIZE 64
>>> DOUBLE_CACHE_LINE_SIZE 128
>> I think we should make that change on a more HotSpot wide basis with
>> a different bug ID. Also, do you see those values (64 and 128) being
>> appropriate for all the CPUs/platforms? Or do you foresee those values
>> being defined per-CPU/platform?
>>> .. and then use whichever is deemed more appropriate on a case by case
>>> basis. (Hint: it's probably CACHE_LINE_SIZE)
>> I accept that hint which is why I'm going with OM_CACHE_LINE_SIZE
>> so that ObjectMonitors can have their own case... :-)
>> And back to David:
>>>> Otherwise the mechanics of the change look fine, except I would
>>>> suggest that this:
>>>> #ifndef OM_CACHE_LINE_SIZE
>>>> // Use DEFAULT_CACHE_LINE_SIZE if not already specified for
>>>> // the current build platform.
>>>> #define OM_CACHE_LINE_SIZE DEFAULT_CACHE_LINE_SIZE
>>>> should go in the shared globalDefinitions.hpp file.
>> I intentionally didn't want to expose OM_CACHE_LINE_SIZE in
>> globalDefinitions.hpp. I'm trying really hard to stay out of that file.
>> Also, is globalDefinitions.hpp processed before or after the CPU/platform
>> specific files?
>> Obviously I need to specify OM_CACHE_LINE_SIZE in the two CPU files:
>> and both halves of the ObjectMonitor subsystem need the above logic:
>> but we don't need to expose OM_CACHE_LINE_SIZE elsewhere. Of course,
>> if we end up adding CACHE_LINE_SIZE and DOUBLE_CACHE_LINE_SIZE to
>> the system *and* if we can get all the CPUs/platforms to agree to use
>> CACHE_LINE_SIZE for ObjectMonitors, then OM_CACHE_LINE_SIZE can be
>> Thanks again for the fast reviews!
>>>>> JDK-8235931 add OM_CACHE_LINE_SIZE and use smaller size on
>>>>> SPARCv9 and X64
>>>>> Here's the webrev URL:
>>>>> Folks that have reviewed JDK-8153224 will recognize these changes as
>>>>> a subset of the cache line changes from the Async Monitor Deflation
>>>>> project. It's a subset because there are more fields that need to be
>>>>> on separate cache lines in the Async Monitor Deflation project.
>>>>> These changes are being tested in a Mach5 Tier[1-3] run that's still
>>>>> running (JDK14 ATR has priority). I've also run these changes thru
>>>>> a set of 10 SPECjbb2015 runs; I'm waiting for my various SPECjbb2015
>>>>> runs to finish so I can do the analysis.
>>>>> Thanks, in advance, for comments, questions or suggestions.
More information about the hotspot-runtime-dev