RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()
peter.levart at gmail.com
Wed Sep 20 08:51:23 UTC 2017
Do you have a source for your benchmark to share? I might be able to
squeeze a little more from this approach but it would be easier if I
could test variants right away and not bother you...
On 09/20/17 10:14, Kazunori Ogata wrote:
> Hi Peter,
> The performance improvement was +2.9%. It is faster than the version that
> uses an extra dereference (+2.2%).
> Although it's slower than the variation of full fence, I think I
> understand Hans's concern and I agree your fix is the right answer.
> I thought DATA_LAYOUT_GUESS in your example is fetched from memory at
> somewhere and arbitrary time, but I now understand the meaning of
> "prefetch dataLayout" is to calculate the value of dataLayout without
> accessing memory. I'm not sure how to calculate it, but I noticed that
> even piking a random value can have a non-zero possibility of passing the
> check at line 1204.5.
> I agree that loading slot can happen before executing full fence if
> the value of dataLayout does not come from memory and there is no data
> dependence between writing to dataLayout and reading from dataLayout. I
> appreciate your comments.
> From: Hans Boehm <hboehm at google.com>
> To: Kazunori Ogata <OGATAK at jp.ibm.com>
> Cc: Peter Levart <peter.levart at gmail.com>, core-libs-dev
> <core-libs-dev at openjdk.java.net>
> Date: 2017/09/19 05:47
> Subject: Re: RFR: 8187033: [PPC] Imporve performance of
> On Mon, Sep 18, 2017 at 10:52 AM, Kazunori Ogata <OGATAK at jp.ibm.com>
>> Hi Peter,
>> Peter Levart <peter.levart at gmail.com> wrote on 2017/09/18 22:05:43:
>>> On 09/18/2017 12:28 PM, Kazunori Ogata wrote:
>>>> Hi Hans and Peter,
>>>> Thank you for your comments.
>>>> Regarding the code Hans showed, I don't yet understand what it the
>>>> problem. Since the load at 1204b is a speculative one,
>>>> slots should not raise any exception. If the confirmation at
>>>> succeeds, the value of tmp must also be correct because we put full
>>>> and we see a non-NULL reference that was stored after the full
>>> I don't know much, but I can imagine that speculative read may see the
>>> value and guess it correctly based on let's say some CPU state of
>>> half-processed write instruction in the pipeline, which is established
>>> even before the fence instruction flushes writes to array slots. So I
>>> can accept that such outcome is possible and doesn't violate JMM.
>> This seems to me that the processor/platform can't implement full fence
>> correctly. I think it is the platform's (processor's and compiler's)
>> responsibility to support full fence, otherwise the platform can't
>> implement all Java API, including VarHandle.fullFence().
> As Peter said, my concern is not with exceptions, but with seeing
> data for slots.
> The semantics of "full fences" are tricky, but basically they don't
> reordering in other threads, only the thread that executed the fence. The
> with the problematic reordering here is the one that saw a non-null
> dataLayout value, and hence did not execute a fence.
> Hence fences generally have to be paired with either another fence in the
> thread, or some other ordering mechanism. That other ordering mechanism is
> missing here, though many implementations will ensure correct ordering,
> due to
> hardware dependence-based ordering guarantees. But the JMM does not
> promise that.
More information about the core-libs-dev