RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

Vitaly Davidovich vitalyd at gmail.com
Thu Jan 24 00:41:12 UTC 2013

Got it now - thanks.  So then does exiting the safepoint guarantee that
these writes are flushed to memory so next time GC threads run they see 0s?
Or is that not important/enforced elsewhere?

Sent from my phone
On Jan 23, 2013 7:36 PM, "John Cuthbertson" <john.cuthbertson at oracle.com>

>  Hi Vitalty,
> On 1/23/2013 4:19 PM, Vitaly Davidovich wrote:
> Hi John,
> Thanks for this explanation as well.  I see what you're saying about the
> concurrency control, but what I don't understand is when this is called:
> void reset_hot_cache() {
> 107     _hot_cache_idx = 0; _n_hot = 0;
> 108   }
> Since these are plain stores, what exactly ensures that they're (promptly)
> visible to other GC threads? Is there some dependency here, e.g. if you see
> _n_hot = 0 then _hot_cache_idx must also be zero? I strongly suspect I
> missed the details in your response that explain why this isn't a concern.
> Is there only a particular type of thread that can call reset_hot_cache
> and/or only at a certain point? It kind of sounds like it so don't know if
> there's an assert that can be added to verify that.
> At the point where this routine is called the GC workers have finished the
> parallel phases of the GC and are idle. The thread that is running is the
> VM thread. The rest of the VM is at a safepoint so we are, in effect,
> single threaded. Yes there is an assert we can add here:
> assert(SafepointSynchronize::is_at_safepoint() &&
> Thread::current()->is_VM_thread(), "...");
> JohnC
>  Thanks
> Sent from my phone
> On Jan 23, 2013 5:51 PM, "John Cuthbertson" <john.cuthbertson at oracle.com>
> wrote:
>> Hi Vitaly,
>> Thanks for looking over the code changes. I'll respond to your other
>> comments in a separate email. Detailed responses inline....
>> On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
>>> Hi John,
>>> Wow, that's a giant heap! :)
>>> I think G1ConcRSLogCacheSize needs to be validated to make sure it's <=
>>> 31; otherwise, I think you get undefined behavior on left shifting with it.
>> Good catch. Done.
>>  I don't think you need _def_use_cache -- can be replaced with
>>> G1ConcRSLogCacheSize > 0?
>> Done. I've added a function that returns the result of the comparison and
>> I use that in place of G1ConcRSLogCacheSize.
>>  I'm sure this is due to my lack of G1 knowledge, but the concurrency
>>> control inside g1HotCardCache is a bit unclear. There's a CAS to claim the
>>> region of cards, there's a HotCache lock for inserting a card.  However,
>>> reset_hot_cache() does a naked write of a few fields.  Are there any
>>> visibility and ordering constraints that need to be enforced? Do some of
>>> the stores need an OrderAccess barrier of some sort, depending on what's
>>> required? Sorry if I'm just missing it ...
>> The drain routine is only called from within a GC pause but it is called
>> by multiple GC worker threads. Each worker will claim a chunk of cards
>> using the CAS and refine them. Resetting the boundaries (the values reset
>> by reset_hot_cache()) in the drain routine would be a mistake since a
>> worker thread could see the new boundary values and return, potentially
>> leaving some cards unrefined and some missing entries in remembered sets. I
>> can only clear the fields when the last thread has finished draining the
>> cache. The best place to do this is just before the VM thread re-enables
>> the cache (we know the worker threads will have finished at this point).
>> Since the "drain" doesn't actually drain, perhaps a better name might be
>> refine_all()?
>> The HotCache lock is used when adding entries to the cache. Entries are
>> added by the refinement threads (and there will most likely be more than
>> one). Since the act of adding an entry can also evict an entry we need the
>> lock to guard against hitting the ABA problem. This could result in
>> skipping the refinement of a card, which will lead to missing remembered
>> set entries which are not fun to track down.
>> Draining during the GC is immune from the ABA problem because we're not
>> actually removing entries from the cache. We would still be immune,
>> however, if we were removing entries since we would not be adding entries
>> at the same time.
>> Thanks,
>> JohnC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20130123/f37e95f0/attachment.htm>

More information about the hotspot-gc-dev mailing list