RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

Vitaly Davidovich vitalyd at gmail.com
Thu Jan 24 00:19:51 UTC 2013

Hi John,

Thanks for this explanation as well.  I see what you're saying about the
concurrency control, but what I don't understand is when this is called:

void reset_hot_cache() {
107     _hot_cache_idx = 0; _n_hot = 0;
108   }

Since these are plain stores, what exactly ensures that they're (promptly)
visible to other GC threads? Is there some dependency here, e.g. if you see
_n_hot = 0 then _hot_cache_idx must also be zero? I strongly suspect I
missed the details in your response that explain why this isn't a concern.
Is there only a particular type of thread that can call reset_hot_cache
and/or only at a certain point? It kind of sounds like it so don't know if
there's an assert that can be added to verify that.


Sent from my phone
On Jan 23, 2013 5:51 PM, "John Cuthbertson" <john.cuthbertson at oracle.com>

> Hi Vitaly,
> Thanks for looking over the code changes. I'll respond to your other
> comments in a separate email. Detailed responses inline....
> On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
>> Hi John,
>> Wow, that's a giant heap! :)
>> I think G1ConcRSLogCacheSize needs to be validated to make sure it's <=
>> 31; otherwise, I think you get undefined behavior on left shifting with it.
> Good catch. Done.
>  I don't think you need _def_use_cache -- can be replaced with
>> G1ConcRSLogCacheSize > 0?
> Done. I've added a function that returns the result of the comparison and
> I use that in place of G1ConcRSLogCacheSize.
>  I'm sure this is due to my lack of G1 knowledge, but the concurrency
>> control inside g1HotCardCache is a bit unclear. There's a CAS to claim the
>> region of cards, there's a HotCache lock for inserting a card.  However,
>> reset_hot_cache() does a naked write of a few fields.  Are there any
>> visibility and ordering constraints that need to be enforced? Do some of
>> the stores need an OrderAccess barrier of some sort, depending on what's
>> required? Sorry if I'm just missing it ...
> The drain routine is only called from within a GC pause but it is called
> by multiple GC worker threads. Each worker will claim a chunk of cards
> using the CAS and refine them. Resetting the boundaries (the values reset
> by reset_hot_cache()) in the drain routine would be a mistake since a
> worker thread could see the new boundary values and return, potentially
> leaving some cards unrefined and some missing entries in remembered sets. I
> can only clear the fields when the last thread has finished draining the
> cache. The best place to do this is just before the VM thread re-enables
> the cache (we know the worker threads will have finished at this point).
> Since the "drain" doesn't actually drain, perhaps a better name might be
> refine_all()?
> The HotCache lock is used when adding entries to the cache. Entries are
> added by the refinement threads (and there will most likely be more than
> one). Since the act of adding an entry can also evict an entry we need the
> lock to guard against hitting the ABA problem. This could result in
> skipping the refinement of a card, which will lead to missing remembered
> set entries which are not fun to track down.
> Draining during the GC is immune from the ABA problem because we're not
> actually removing entries from the cache. We would still be immune,
> however, if we were removing entries since we would not be adding entries
> at the same time.
> Thanks,
> JohnC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20130123/2aa568db/attachment.htm>

More information about the hotspot-gc-dev mailing list