Proposal: MaxTenuringThreshold and age field size changes
Y Srinivas Ramakrishna
Y.S.Ramakrishna at Sun.COM
Thu Jun 5 08:03:11 UTC 2008
Hi Nick --
You make a good point.
I think there are scenarios in which the use of PLAB's in the survivor spaces
can cause survivor space overflow, even when the total space in the survivor
space should have otherwise been enough. That then sets in motion a series
of "nepotistical" cycles, which are caused when young objects Z that are
prematurely promoted die in the old generation while holding references to
now dead objects Z' in the young generation. Call these objects "zombies"
because they are dead but not recognized as such.
Z keep Z' alive, because a scavenge does not know that Z is dead,
because it considers all references from the old gen as roots.
Worse, if Z' has references to Z and Z' stays in the young generation
forever (which they can under the circumstances you describe)
then Z will not be recognized as dead by a CMS collection (which
currently treats all objects in the young generation as roots).
This is a well-understood problem when spaces are collected independently
in this manner. (The workaround is to have the CMS collector not treat
the young generation as a source of roots but rather to mark through the
young gen objects starting from roots.)
Of course when MTT=15, then eventually every such Z' will be forced to promote to
the old gen and the garbage cycle will all (hopefully) move into the old gen and
thence will be reclaimed by the old gen CMS collection.
This would explain the behaviour difference you saw.
Now to come to the first point I made above:
the use of multiple scavenger threads and their use of PLABs can sometimes
cause this kind of overflow to happen (especially if there is the occasional
large object). You might want to switch off the use of survivor space
PLAB's (or fix them at a vey small modest value) or just use a single-threaded
scavenger and see that this kind of behaviour might reduce because overflow
becomes much less likely.
I think there is an open bug to tune the adaptive PLAB sizing code to
eliminate this kind of pathological behaviour, but we have not had the opportunity to
get to that bug.
If you have PrintGCDetails logs, they would probably show the
premature promotion happening. PrintTenuringDistribution would not
show any objects of age greater than 2 initially, yet some objects
would be seen to be promoted, and then by virtue of the cross-generational
references from a promoted zombie, we would artificially expand the
lifetime of objects in the young generation (creating zombies Z')
and so on. I believe it was partially this kind of behaviour on the
part of generational scavenger implementations that caused some
people in the past to start advocating the clearing of all references
in objects that they knew they would be dropping references to (which of course
we all know is difficult to do correctly and fraught with all kinds of
problems and errors).
More information about the hotspot-gc-dev