RFR: 8139424: SIGSEGV, Problematic frame: # V [libjvm.so+0xd0c0cc] void InstanceKlass::oop_oop_iterate_oop_maps_specialized<true, oopDesc*, MarkAndPushClosure>
thomas.schatzl at oracle.com
Thu Nov 12 08:28:00 UTC 2015
On Wed, 2015-11-11 at 16:41 +0100, Stefan Johansson wrote:
> Please review this fix for:
> The crash was caused by a faulty eager humongous reclaim. The reason for
> reclaiming a live humongous object was an overlooked remembered set
> entry when the object was treated as a candidate for humongous
> reclamation. If the remembered set was expanded during the previous GC,
> the code handling reclaim candidates would look at the old view of the
> remembered set and due to that miss some entries. This was caused by
> checking for eager reclaim candidates before calling cleanupHRRS, which
> takes care of updating the remembered sets to be ready for iteration.
> The fix was to simply move the call to rem_set()->cleanupHRRS() to
> before register_humongous_regions_with_cset(), which is were we check
> for reclaim candidates.
> I also added a test that provokes this and asserts when run with a
> fastdebug build, the test is not 100% deterministic and requires to be
> run with some special parameters set in the JTREG header. The test will
> never fail intermittently, and could possibly pass even though there are
> problems in the code. I still think it is worth adding, to avoid doing
> the same error again in the future.
> Failure was reproducable in 1-2 hours on the sparc-host where it
> occurred, and with this fix a 24 hour run was fine. The JTREG test also
> fails without the fix but passes when it is applied.
As for the added guarantee, please either remove or make it an assert:
HeapRegionRemSet::occupied() can be very slow. Doing that calculation
twice adds insult to injury :)
Others already commented about the additional "s" in the added comment.
Could you also clarify in the comment why we need an up-to-date
remembered set in register_..., i.e. that that method iterates over the
remembered sets? That seems to be missing in the reasoning chain...
There is also a typo in the changeset comment, "obejct" instead of
Now cleanupHRRS() is executed twice during a single gc for no apparent
reason to me. Could you either remove it from
G1RemSet::prepare_for_oops_into_collection_set, or (I think) better move
that call up?
Actually I think that now with
G1CollectedHeap::pre_evacuate_collection_set(), both the call to
prepare_for_oops_into_collection_set() are better placed there. Both
methods seem to be preparation for the actual evacuation.
Did you check the runtime of this test on the very slow machines?
I also think we should make sure this change also ends up in the next 8u
(whatever kind of) update. Can you take care of this?
Thanks a lot for looking into this,
More information about the hotspot-gc-dev