tom.rodriguez at oracle.com
Wed Mar 16 17:35:02 UTC 2011
I was getting ready to finish my statics fields in Class changes when I hit a failure with jbb and CMS. I've tracked it down to a race in the machinery for updating oop relocations and the logic for making sure that a scavengable nmethod is only scanned once. During a scavenge an nmethod can be reached for scanning in two different ways, either as a live activation on some thread stack or during the scan of scavengeable nmethods. The scan of scavengeable nmethods does two things though. It does the oops_do for the nmethod and then it calls fix_oop_relocations to update the generated code to match the new oop values. The problem is that the scan of the thread stacks and the scan of the scavengable nmethods are performed concurrently so the stack scanning thread might claim the nmethod first but actually scan the nmethod after the call to fix_oop_relocations in the other thread, leaving the oops valid but the code stale.
I think the logical place to move the fix_oop_relocations call is into nmethod::oops_do_marking_epilogue. Does this seem reasonable to anyone who understands the new nmethod scavenge code better than I do? It seems to work fine.
Actually one thing I noticed is that the nmethod::oops_do_marking_prologue/epiloque logic is being called during full gcs which seems somewhat pointless to me since it mostly creates redundant work. Actually if it's really scanning the scavengable nmethods there then it's turning them into strong roots which is wrong. Only nmethods which are live on stack should be scanned as strong roots.
Does anyone know why the test_set_oops_do_mark builds yet another linked list instead of just having a flag on the nmethod to indicate that it's claimed? It seems overly complicated. The contents of the list should be the same as the scavenge roots list and a simple flag would indicate whether it was marked or not.
More information about the hotspot-gc-dev