review request (M) - 6725714 add a table to speed up bitmap searches
thomas.schatzl at oracle.com
Mon May 27 09:45:59 UTC 2013
On Thu, 2013-05-23 at 08:03 -0700, John Coomes wrote:
> Please review the following change which adds a new table (the
> 'BlockTable') to the parallel compacting collector:
> I've run a number of test suites (vm.regression, vm.gc, vm.runtime,
> nsk.stress, nsk.monitoring, vm.compiler) in 32- and 64-bit on solaris
> and/or linux.
Some notes/questions; please understand that I do not know the parallel
old algorithm that well, so there might be some misunderstandings on my
side. The comments are a little out-of-order too:
- what is the motivation to increase
ParallelScavengeHeap::intra_heap_alignment() in this change? This
particular change does not seem to be related to this optimization.
- why is ParallelCompactData::RegionData::_blocks_filled a size_t? It
is only ever used as a bool.
- ParallelCompactData::ParallelCompactData::_region_end is not
initialized in the constructor. I guess it's benign, but for the same
reason the code could skip initializing _region_start and others (which
are set in the initialize() method.
I'd appreciate the use of an initializer list here. :)
- in ParallelCompactData::calc_new_pointer() there is this comment to
"Run some performance tests to determine if this special case pays off".
Does that mean that you *ran* these tests, or that you still have to run
them, or is this comment has been intended to be some todo for you (that
should be removed)?
(There is a superfluous white space between the two sentences btw)
- PSParallelCompact::fill_blocks() is only every called with the first
paramter set to NULL (and only once). Could the first parameter be
- the first comment in PSParallelCompact::fill_blocks() seems to be
// Fill in the block table elements for a region. Each element holds
// the number of live words in the region that are to the left of the
I think it is better to say "area"/"heap", i.e. something different than
"region" here (if I understand the code correctly). "Region" is already
defined in this context as the work unit for parallel gc. If the comment
really meant region here, then why does the algorithm need multiple
blocks per Region? (if they only ever contain the number of live words
of regions to the left of this region?)
// first object that starts in the block. Thus only blocks in which an
// object starts need to be filled.
I also think that the conclusion in the second sentence is not supported
by the first sentence. I think the reason for only requiring to fill
blocks where an object starts is that the code never asks for the number
of live bytes of other blocks. I.e. the code in calc_new_pointer() only
ever retrieves this value for blocks that have live objects in them.
(the two sentences are also separated by two spaces too :)
- maybe it is a good idea to add a comment in calc_new_pointer() that
it is possible that multiple threads update the blocks of a region at
the same time. Except that it is possible that multiple threads may
update the block table at the same time it does not seem to harm (except
for wasted cpu cycles; I guess in your tests you found that this is not
I was a bit thrown off by the use of the
Atomic::inc_ptr(...->_blocks_filled_cnt) at first...
- ParallelCompactData::region_calc_new_pointer(HeapWord* addr) is
within a complete ASSERT block, so the #ifdef ASSERT (or DEBUG_ONLY) may
not be required.
- I think the ParallelCompactData::calc_new_pointer(HeapWord* addr)
method could be moved into the private section of the class.
- Jon already mentioned this: region_calc_new_pointer() is unused. If it
can be used for some verification, maybe rename it to
calc_new_pointer_slow() or something like that to indicate that this is
for verification purposes?
- would it be possible to rename Log2BitsPerBlock which indicates that
it is a value that is not only relative to the block size, but also
relative to the object granularity/alignment? The NOT_LP64(-1) in
PSParallelCompact::fill_blocks(), line 3258 looks quite strange to me.
Maybe something like Log2ObjStartsPerBlock?
(Actually, did you test the code with different heap object alignments?
I expect it to fail with e.g. -XX:MinObjAlignment=16, i.e. something
different than the default ones)
This would probably make a good jtreg test, fill the heap, and issue a
few full gcs with different settings that may affect the code.
- this is just style: but I'd make fill_blocks() to set the blocks
filled, i.e. call region_ptr->set_blocks_filled() within the method.
I.e. the fill_blocks method does not set the block-is-filled flag
itself. Such things tend to be forgotten, especially if it is not
written down anywhere too.
(Not sure anymore though, maybe some comment in the declaration of
fill_blocks() in the hpp file?)
- the opening brace in the implementation of fill_blocks() is on the
next line, not in the same as the declaration; but this file contains a
mixed style in that respect, so not sure if interesting.
- ParallelCompactData::clear() seems unused.
- in PSParallelCompact::invoke_no_policy, in the new code within the
TraceParallelOldGCCompactionPhase block tty is used directly. Is it
possible to change this to gclog_or_tty to properly redirect its output
to the gclog if possible? Gclog_or_tty seems to be already initialized
because it's already used a few lines above.
- this block of code also only seems to make sense if the
_blocks_filled_cnt variables of the regions are filled in. That code is
DEBUG_ONLY, but this code is under ASSERT. Could these #if-s be matched?
(I do know that ASSERT is always defined in debug mode, but still)
More information about the hotspot-gc-dev