tom.rodriguez at oracle.com
Wed Mar 16 18:18:52 UTC 2011
On Mar 16, 2011, at 11:01 AM, Y. S. Ramakrishna wrote:
> Hi Tom --
> A form of the strong-scanning problem for the nmethod scanning]
> code existed for a while, long preceding ScavengeRootsInCode, and
> was fixed a while ago so that the code-cache scanning did not
> inadvertently turn code-oops into strong refs.
What issue do you mean? Not the MDO one?
> I can see that
> a parameter do_strong_roots_only now distinguishes the two types of scans:
> being true for the active nmethods on thread stacks scan
> and false for the code-cache scan of all nmethods, where the
> refs are treated weakly.
That flag only controls a very minor part of the weakness. Oops in the exception cache are weak because they can be dropped even if the nmethod is alive and those are the only oops that strong_roots_only controls. For all nmethods that don't have live activations, if any oop they reference isn't live after a strong scan then they will be unloaded.
> I think that the list of scavengeable
> nmethod scanning should also be done weakly.
The list of scavengeable nmethods shouldn't be scanned at all during a full collection. The whole code cache is already scanned. By scanning it as strong root it can keep classes from being unloaded. Presumably over time nmethods will migrate off of the scavenge list but it's still incorrect.
> I believe that would also take
> care of the synchronization issue that you ran into below,
> since there is in fact a synchronization
> barrier between the strong roots scan which must semantically
> strictly precede the weak scan. It would seem as though one
> should do the oop relocations during the second, weak scan?
There's no problem with full gc's because gc_epilogue fixes the oop relocations at the end. That also why scanning the scavengeable list is a total waste. Any work it does is a duplication of work that will be done anyway.
> I don't know the answers to your other questions, but I am guessing
> John Rose would.
> -- ramki
> On 03/16/11 10:35, Tom Rodriguez wrote:
>> I was getting ready to finish my statics fields in Class changes when I hit a failure with jbb and CMS. I've tracked it down to a race in the machinery for updating oop relocations and the logic for making sure that a scavengable nmethod is only scanned once. During a scavenge an nmethod can be reached for scanning in two different ways, either as a live activation on some thread stack or during the scan of scavengeable nmethods. The scan of scavengeable nmethods does two things though. It does the oops_do for the nmethod and then it calls fix_oop_relocations to update the generated code to match the new oop values. The problem is that the scan of the thread stacks and the scan of the scavengable nmethods are performed concurrently so the stack scanning thread might claim the nmethod first but actually scan the nmethod after the call to fix_oop_relocations in the other thread, leaving the oops valid but the code stale.
>> I think the logical place to move the fix_oop_relocations call is into nmethod::oops_do_marking_epilogue. Does this seem reasonable to anyone who understands the new nmethod scavenge code better than I do? It seems to work fine.
>> Actually one thing I noticed is that the nmethod::oops_do_marking_prologue/epiloque logic is being called during full gcs which seems somewhat pointless to me since it mostly creates redundant work. Actually if it's really scanning the scavengable nmethods there then it's turning them into strong roots which is wrong. Only nmethods which are live on stack should be scanned as strong roots.
>> Does anyone know why the test_set_oops_do_mark builds yet another linked list instead of just having a flag on the nmethod to indicate that it's claimed? It seems overly complicated. The contents of the list should be the same as the scavenge roots list and a simple flag would indicate whether it was marked or not.
More information about the hotspot-gc-dev