(Preliminary) RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread
thomas.schatzl at oracle.com
Mon May 6 08:02:27 UTC 2013
On Sat, 2013-05-04 at 21:23 +0200, Peter Levart wrote:
> On 05/04/2013 07:38 PM, Vitaly Davidovich wrote:
> > Oops, that was my mistake - I thought the lock here was a j.u.c.Lock
> > which of course doesn't even make sense given we're talking about
> > ObjectMonitor. So disregard that bit.
> > Ignoring OOM and continuing just seems very fragile as it means you
> > somehow know that all state is still consistent. Most java code
> > isn't async exception safe, so it's hard to reason about state after
> > OOM. Maybe Reference Handler is OK in that regard though.
> I think it is safe to ignore OOME here, since this is the only place
As my test program shows, this may not be the only place where heap
allocation can happen within that code.
Alan also mentioned something about instrumentation that can add memory
allocations basically anywhere.
As the reference handler code is plain java code, it will be affected as
other java code.
> where heap allocation happens and it is known what provokes it.
> Otherwise ReferenceHandler just shuffles existing pointers in existing
In the current compilers, yes. But what about other compilers/VMs that
may add more elaborate profiling information here and there? E.g. the
graal compiler? Not sure if it does, probably it does not, but it seems
shaky to rely on compiler code generation here.
So is it possible to handle all known issues (especially if the fix is
similar/known/understandable), not just the original one, here?
I mean, if we fix this issue as you suggested (I am not against that, it
looks reasonable), I would not know what to do with the test program
except file another bug against the very same component with the same
problem, with the same fix suggestion.
I.e. the code still seems to buggy even after applying that fix, and
with instrumentation I think I could create another reproducer. As
reference processing is something critical to the VM, I think it is
prudent to fix all known problems here and now if possible.
Maybe there is some argument against simply wrapping the entire loop
with a try-catch? Please elaborate if so (maybe I overlooked that?), or
maybe you could suggest another solution?
Additionally, just fixing only the exact issue here seems somewhat
wasteful in terms of time (imho) - because next time another developer
will likely spend lots of time trying to reproduce this again. (As
mentioned, the former assignee also seemed to be unable to reproduce
this for a long time).
More information about the core-libs-dev