RFR: 8079315: UseCondCardMark broken in conjunction with CMS precleaning
erik.osterlund at lnu.se
Tue May 12 20:44:14 UTC 2015
Hi Andrew and Aleksey,
It is of course true that a hypothetical OS + hardware combination could /in theory/ be smart enough to not send the TLB purge message to certain CPUs and hence not flush latent stores there. But in practice none does that which I know of and certainly none that we support. As I said in the original email where I proposed the solution, I already had a look at our architectures (and a few more) in the linux kernel and XNU/BSD - and it’s safe everywhere. And as I said earlier, the closest match I found to out-smart the barrier is itanium that broadcasts the TLB purge with a special instruction rather than IPI: ptc.ga. It takes an address range and purges the corresponding TLB entries. However, according to Intel’s own documentation, even such fancy solution still flushes all latent memory accesses on remote CPUs regardless.
I don’t know what windows does because it’s open source but we only have x86 there and its hardware has no support for doing it any other way than with IPI messages which is all we need. And if we feel that scared, windows has a system call that does exactly what we want and with the architecture I propose it’s trivial to specialize the barrier for windows do use this instead.
If there was to suddenly pop up a magical fancy OS + hardware solution that is too clever for this optimization (seems unlikely to me) then there are other ways of issuing such a global fence. But I don’t see the point in doing that now when there is no such problem in sight.
On 12 May 2015, at 21:19, Andrew Haley <aph at redhat.com<mailto:aph at redhat.com>> wrote:
On 05/12/2015 08:25 PM, Aleksey Shipilev wrote:
On 12.05.2015 22:03, Andrew Haley wrote:
On 05/12/2015 07:54 PM, Aleksey Shipilev wrote:
On 12.05.2015 20:23, Erik Österlund wrote:
Out of curiosity I patched the thing and my fix can be found here: http://cr.openjdk.java.net/~eosterlund/8079315/webrev.v1/ <http://cr.openjdk.java.net/%7Eeosterlund/8079315/webrev.v1/>
Wait, how does it work? I presumed you need to poll the serialization page (and then handle the possible trap) in mutator, between storing the reference and reading the card mark. Just mprotect-ing a page does not smell like a serializing event, if you don't actually access the page.
I think it is, because the kernel has to interrupt every CPU in that
process to flush its write buffer and TLB. I can't think of any other
way of making munmap() work.
If you have a platform with a software-filled TLB, can't you accurately
track what CPUs should perform the TLB shootdowns? Or if there is some
other way for a CPU to communicate the TLB contents back to OS.
There isn't AFAIK: the TLB is always a CPU-local structure.
If that is in place, then mprotect can only affect the cores that
actually accessed the serialization page. I feel that relying on the
premise that page mapping changes are globally synchronized is
You're probably right about that. I suppose it would be possible
in theory for there to be a broadcast TLB invalidate event as part
of the cache coherency protocol.
I wonder if the kernel people would be receptive to the idea of a
"execute a memory fence on every CPU" system call. It would not be
hard to do and it would be very useful.
More information about the hotspot-dev