RFR: 8187033: [PPC] Imporve performance of ObjectStreamClass.getClassDataLayout()

Kazunori Ogata OGATAK at jp.ibm.com
Wed Sep 20 08:14:25 UTC 2017

Hi Peter,

The performance improvement was +2.9%.  It is faster than the version that 
uses an extra dereference (+2.2%).

Although it's slower than the variation of full fence, I think I 
understand Hans's concern and I agree your fix is the right answer.


I thought DATA_LAYOUT_GUESS in your example is fetched from memory at 
somewhere and arbitrary time, but I now understand the meaning of 
"prefetch dataLayout" is to calculate the value of dataLayout without 
accessing memory.  I'm not sure how to calculate it, but I noticed that 
even piking a random value can have a non-zero possibility of passing the 
check at line 1204.5.

I agree that loading slot[17] can happen before executing full fence if 
the value of dataLayout does not come from memory and there is no data 
dependence between writing to dataLayout and reading from dataLayout.  I 
appreciate your comments.


From:   Hans Boehm <hboehm at google.com>
To:     Kazunori Ogata <OGATAK at jp.ibm.com>
Cc:     Peter Levart <peter.levart at gmail.com>, core-libs-dev 
<core-libs-dev at openjdk.java.net>
Date:   2017/09/19 05:47
Subject:        Re: RFR: 8187033: [PPC] Imporve performance of 

On Mon, Sep 18, 2017 at 10:52 AM, Kazunori Ogata <OGATAK at jp.ibm.com> 
> Hi Peter,
> Peter Levart <peter.levart at gmail.com> wrote on 2017/09/18 22:05:43:
> > On 09/18/2017 12:28 PM, Kazunori Ogata wrote:
> > > Hi Hans and Peter,
> > >
> > > Thank you for your comments.
> > >
> > > Regarding the code Hans showed, I don't yet understand what it the
> > > problem.  Since the load at 1204b is a speculative one, 
> > > slots[17] should not raise any exception.  If the confirmation at
> 1204.5
> > > succeeds, the value of tmp must also be correct because we put full
> fence
> > > and we see a non-NULL reference that was stored after the full 
> >
> > I don't know much, but I can imagine that speculative read may see the
> > value and guess it correctly based on let's say some CPU state of
> > half-processed write instruction in the pipeline, which is established
> > even before the fence instruction flushes writes to array slots. So I
> > can accept that such outcome is possible and doesn't violate JMM.
> This seems to me that the processor/platform can't implement full fence
> correctly.  I think it is the platform's (processor's and compiler's)
> responsibility to support full fence, otherwise the platform can't
> implement all Java API, including VarHandle.fullFence().

As Peter said, my concern is not with exceptions, but with seeing 
data for slots[17].

The semantics of "full fences" are tricky, but basically they don't 
reordering in other threads, only the thread that executed the fence. The 
with the problematic reordering here is the one that saw a non-null
dataLayout value, and hence did not execute a fence.

Hence fences generally have to be paired with either another fence in the 
thread, or some other ordering mechanism. That other ordering mechanism is
missing here, though many implementations will ensure correct ordering, 
due to
hardware dependence-based ordering guarantees. But the JMM does not
promise that.


More information about the core-libs-dev mailing list