RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap

John Cuthbertson john.cuthbertson at oracle.com
Wed Jan 23 22:51:17 UTC 2013

Hi Vitaly,

Thanks for looking over the code changes. I'll respond to your other 
comments in a separate email. Detailed responses inline....

On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
> Hi John,
> Wow, that's a giant heap! :)
> I think G1ConcRSLogCacheSize needs to be validated to make sure it's 
> <= 31; otherwise, I think you get undefined behavior on left shifting 
> with it.

Good catch. Done.

> I don't think you need _def_use_cache -- can be replaced with 
> G1ConcRSLogCacheSize > 0?

Done. I've added a function that returns the result of the comparison 
and I use that in place of G1ConcRSLogCacheSize.

> I'm sure this is due to my lack of G1 knowledge, but the concurrency 
> control inside g1HotCardCache is a bit unclear. There's a CAS to claim 
> the region of cards, there's a HotCache lock for inserting a card.  
> However, reset_hot_cache() does a naked write of a few fields.  Are 
> there any visibility and ordering constraints that need to be 
> enforced? Do some of the stores need an OrderAccess barrier of some 
> sort, depending on what's required? Sorry if I'm just missing it ...

The drain routine is only called from within a GC pause but it is called 
by multiple GC worker threads. Each worker will claim a chunk of cards 
using the CAS and refine them. Resetting the boundaries (the values 
reset by reset_hot_cache()) in the drain routine would be a mistake 
since a worker thread could see the new boundary values and return, 
potentially leaving some cards unrefined and some missing entries in 
remembered sets. I can only clear the fields when the last thread has 
finished draining the cache. The best place to do this is just before 
the VM thread re-enables the cache (we know the worker threads will have 
finished at this point). Since the "drain" doesn't actually drain, 
perhaps a better name might be refine_all()?

The HotCache lock is used when adding entries to the cache. Entries are 
added by the refinement threads (and there will most likely be more than 
one). Since the act of adding an entry can also evict an entry we need 
the lock to guard against hitting the ABA problem. This could result in 
skipping the refinement of a card, which will lead to missing remembered 
set entries which are not fun to track down.

Draining during the GC is immune from the ABA problem because we're not 
actually removing entries from the cache. We would still be immune, 
however, if we were removing entries since we would not be adding 
entries at the same time.



More information about the hotspot-gc-dev mailing list