RFR: 8079315: UseCondCardMark broken in conjunction with CMS precleaning
aph at redhat.com
Mon May 11 13:41:22 UTC 2015
On 05/11/2015 12:33 PM, Erik Österlund wrote:
> Hi Andrew,
>> On 11 May 2015, at 11:58, Andrew Haley <aph at redhat.com> wrote:
>> On 05/11/2015 11:40 AM, Erik Österlund wrote:
>>> I have heard statements like this that such mechanism would not work
>>> on RMO, but never got an explanation why it would work only on
>>> TSO. Could you please elaborate? I studied some kernel sources for
>>> a bunch of architectures and kernels, and it seems as far as I can
>>> see all good for RMO too.
>> Dave Dice himself told me that the algorithm is not in general safe
>> for non-TSO. Perhaps, though, it is safe in this particular case. Of
>> course, I may be misunderstanding him. I'm not sure of his reasoning
>> but perhaps we should include him in this discussion.
> I see. It would be interesting to hear his reasoning, because it is
> not clear to me.
>> From my point of view, I can't see a strong argument for doing this on
>> AArch64. StoreLoad barriers are not fantastically expensive there so
>> it may not be worth going to such extremes. The cost of a StoreLoad
>> barrier doesn't seem to be so much more than the StoreStore that we
>> have to have anyway.
> Yeah about performance I’m not sure when it’s worth removing these
> fences and on what hardware.
Your algorithm (as I understand it) trades a moderately expensive (but
purely local) operation for a very expensive global operation, albeit
with much lower frequency. It's not clear to me how much we value
continuous operation versus faster operation with occasional global
stalls. I suppose it must be application-dependent.
> In this case though, if it makes us any happier, I think we could
> probably get rid of the storestore barrier too:
> The latent reference store is forced to serialize anyway after the
> dirty card value write is observable and about to be cleaned. So the
> potential consistency violation that the card looks dirty and then
> cleaning thread reads a stale reference value could not happen with
> my approach even without storestore hardware protection. I didn’t
> give it too much thought but on the top of my mind I can’t see any
> problems. If we want to get rid of storestore too I can give it some
> more thought.
That is very interesting.
> But you know much better than me if these fences are problematic or
> not. :)
Not really. AArch64 is an architecture not an implementation, and is
designed to be implemented using a wide range of techniques. Instead
of having very complex cores, some designers seem have decided it
makes sense to have many of them on a die. It may well be, though,
that some implementers will adopt an x86-like highly-superscalar
architecture with a great deal of speculative execution. I can only
predict the past... My approach with this project has been to do
things in the most straightforward way rather than trying to optimize
for whatever implementations I happen to have available.
More information about the hotspot-dev