RFR(M/L): 7176479: G1: JVM crashes on T5-8 system with 1.5 TB heap
john.cuthbertson at oracle.com
Wed Jan 23 22:51:17 UTC 2013
Thanks for looking over the code changes. I'll respond to your other
comments in a separate email. Detailed responses inline....
On 1/15/2013 4:57 PM, Vitaly Davidovich wrote:
> Hi John,
> Wow, that's a giant heap! :)
> I think G1ConcRSLogCacheSize needs to be validated to make sure it's
> <= 31; otherwise, I think you get undefined behavior on left shifting
> with it.
Good catch. Done.
> I don't think you need _def_use_cache -- can be replaced with
> G1ConcRSLogCacheSize > 0?
Done. I've added a function that returns the result of the comparison
and I use that in place of G1ConcRSLogCacheSize.
> I'm sure this is due to my lack of G1 knowledge, but the concurrency
> control inside g1HotCardCache is a bit unclear. There's a CAS to claim
> the region of cards, there's a HotCache lock for inserting a card.
> However, reset_hot_cache() does a naked write of a few fields. Are
> there any visibility and ordering constraints that need to be
> enforced? Do some of the stores need an OrderAccess barrier of some
> sort, depending on what's required? Sorry if I'm just missing it ...
The drain routine is only called from within a GC pause but it is called
by multiple GC worker threads. Each worker will claim a chunk of cards
using the CAS and refine them. Resetting the boundaries (the values
reset by reset_hot_cache()) in the drain routine would be a mistake
since a worker thread could see the new boundary values and return,
potentially leaving some cards unrefined and some missing entries in
remembered sets. I can only clear the fields when the last thread has
finished draining the cache. The best place to do this is just before
the VM thread re-enables the cache (we know the worker threads will have
finished at this point). Since the "drain" doesn't actually drain,
perhaps a better name might be refine_all()?
The HotCache lock is used when adding entries to the cache. Entries are
added by the refinement threads (and there will most likely be more than
one). Since the act of adding an entry can also evict an entry we need
the lock to guard against hitting the ABA problem. This could result in
skipping the refinement of a card, which will lead to missing remembered
set entries which are not fun to track down.
Draining during the GC is immune from the ABA problem because we're not
actually removing entries from the cache. We would still be immune,
however, if we were removing entries since we would not be adding
entries at the same time.
More information about the hotspot-gc-dev