Parallelizing symbol table/string table scan
thomas.schatzl at oracle.com
Mon Nov 18 13:17:18 PST 2013
On Mon, 2013-11-11 at 17:31 +0100, Thomas Schatzl wrote:
> Hi Karen,
> On Mon, 2013-11-11 at 10:46 -0500, Karen Kinnear wrote:
> > On Nov 11, 2013, at 8:56 AM, Thomas Schatzl wrote:
> > > Hi all,
> > >
> > > recently we (the gc team) noticed severe performance issues with
> > > symbol table and string table scan during remark.
> > >
> > > Basically, in G1 these pauses are the largest pauses on a reasonably
> > > tuned system. Also, in particular, symbol table scan alone takes 50% of
> > > total remark time. String table scan takes another 13%.
> > >
> > > At least symbol table scan is a pretty big issue.
> > >
> > > The simple approach to those is to parallelize these tasks of course,
> > > however I would like to query you for comments or suggestions :)
> > > (I am simply throwing some ideas on the wall, in the hope something
> > > sticks...)
> > I don't see any reason not to parallelize the scanning.
> That's on the agenda anyway - as soon as there is class unloading during
> remark, we need to look at the symbol table more quickly than it is done
> right now.
I have a prototype that parallelizes both string and symbol table
Performance improvement is very good, by not scrubbing string table
(because they are roots) and parallelizing symbol table scrubbing,
average remark pause time goes down to 20%. (In that benchmark, symbol
table scan took the vast majority of remark time after initialization).
This is still somewhat longer than the average young gc pause. Also
considering with class unloading after marking there is need to do the
string table scrubbing again.
> > > One idea that came up to optimize that further has been to not do string
> > > table or symbol table scrubbing after gc at all if no class unloading
> > > has been done, assuming that the amount of dead entries are zero anyway.
> > Just to clarify - there are temp Symbols in the symbol table - so the number
> > of dead entries with no class unloading will be close to zero, i.e. small enough
> > that your suggestion of not doing scrubbing unless there has been class loading
> > makes sense - just don't assume zero.
> Thanks for the information, that's exactly what we need.
> Do you have any idea (statistics) about the number of these temp symbols
> compared to ones generated by class loading? Is there a way to
> distinguish them easily? (So that I can implement statistics myselves).
I have a few statistics now, and the temp symbols seem to be
non-negligible, over the course of a three hour run, about 5% of symbols
are getting stale (~43k from ~860k) without class unloading.
> I.e. would it make sense to split the symbol table into two tables, one
> for temp symbols and one for symbols generated by class loading? So that
> we could always have a look at the temp symbols, but only at others
> during class loading?
Given above death rate for symbols, this idea seems to be most promising
to me. Anyone knows whether it is easily possible to differentiate
between symbols generated by class loading and temp symbols?
Incremental scrubbing or other measures seem to suffer from the need to
be tuned to a particular application, or need some other heuristics. The
amount of temp symbols seem to be small enough to be manageable on every
gc or remark.
Another alternative, storing the symbols (or the location of them in the
hash table, or the hash table bucket, or similar) would slow down symbol
table reference counting.
More information about the hotspot-runtime-dev