RFR (M): 8077144: Concurrent mark initialization takes too long

Kim Barrett kim.barrett at oracle.com
Sat Mar 26 01:38:16 UTC 2016


> On Mar 15, 2016, at 6:12 PM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Mikael,
> 
>  updated webrev at
> http://cr.openjdk.java.net/~tschatzl/8077144/webrev.3/ (full)
> http://cr.openjdk.java.net/~tschatzl/8077144/webrev.2_to_3/ (diff)
> 
> which implements the suggested changes.

I've run out of steam for looking at g1ConcurrentMark.cpp changes for
now.  I may have more comments there later.  I like the new approach
though. 

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp
  51 #include "logging/logTag.hpp"

Why add this #include?  log.hpp appears to be the "include all of
logging" header, and it's already included.  And I didn't find
anything specific to log tags in the changes.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
1142   inline void set_card_bitmap_range(BitMap* bm,

Is this *really* necessary?  I suspect set_range with a small_range
hint would get much of the benefit (especially with the
_last_marked_bit_idx optimization in the caller), with a *lot* less
code.  E.g.

  bm.set_range(beg, end, BitMap::small_range);

And that's ignoring the question of where this should actually cross
over between looping and calling set_range; do you have measurements
to show that 8 is actually a good crossing point?

Just using set_range would also make this assert superfluous:
1148     assert(end_idx <= bm->size(), "sanity");

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
1151     // use par_at_put_range (if parallel). The range is made up of the

I see no parallelism here.  Maybe this comment has some left over cut
and paste?

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
1199     BitMap::idx_t card_num = (BitMap::idx_t)(uintptr_t(addr) >> CardTableModRefBS::card_shift);

I would like the cast to BitMap::idx_t removed; I don't think it
serves any useful purpose because idx_t is size_t is big enough.  And
if it did perform a narrowing, I'd rather not have that hidden if/when
we someday turn on -Wconversion or the like.

Personally, I'd prefer an explict reinterpret_cast for the uintptr_t
conversion, but I realize using C++ cast operators is still kind of an
unusual thing in this code base.

1286       (BitMap::idx_t)(uintptr_t(G1CollectedHeap::heap()->reserved_region().start()) >> CardTableModRefBS::card_shift);

Same thing here.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1260       int obj_sz = obj->size();
...
1268       marked_bytes += (size_t)obj_sz * HeapWordSize;

[Pre-existing, feel free to ignore.]

obj_sz could be declared size_t, since obj->size() is not going to
return a negative value, and then the cast could be removed.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1206   void set_bit_for_region(HeapRegion* hr) {
1207     BitMap::idx_t index = (BitMap::idx_t) hr->hrm_index();
1208     _region_bm->par_at_put(index, true);
1209   }

This is the only use of _region_bm in the G1LiveDataHelper class. Of
the three places that make one of these, two provide a region bitmap
and one passes in NULL, meaning for that one it can't actually call
this function, which suggests this function shouldn't really be part
of the helper class. (It could be a static member taking a bitmap
argument, or a similar file scoped helper function.)

[Pre-existing]
The explicit cast of hrm_index to BitMap::idx_t serves no useful
purpose, and if it were a narrowing conversion the cast would just
mask that if we ever turn on -Wconversion or the like.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
1281     //assert(region_bm != NULL, "");
1282     assert(card_bm != NULL, "");

Left-over debugging code?  Later: Oh, the commented out assert is
related to the above discussion about set_bit_for_region.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
 688 void G1ConcurrentMark::cleanup_for_next_mark() {
...
 699   clear_bitmap(_nextMarkBitMap, _parallel_workers, true);
 700 
 701   // Clear the live count data. If the marking has been aborted, the abort()
 702   // call already did that.
 703   if (!has_aborted()) {
 704     clear_all_live_data(_parallel_workers);
 705     DEBUG_ONLY(verify_all_live_data());
 706   }

[Pre-existing]
abort() also clears the _nextMarkBitMap, but appears to do so in STW.
Seems like one of those is superfluous.  And wouldn't it be better to
not do any of these in (STW) abort() but rather do all of them here?

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2496       _bitmap->clear_range(start, end);

If it weren't for the risk of tripping the (IMO bogus) assert on
minimum size in BitMap::clear_large_range, I would suggest calling
clear_range with a large_range hint.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
2475 class G1ClearAllLiveDataTask : public AbstractGangTask {
...
2494       BitMap::idx_t start = M * BitsPerByte * to_process;
2495       BitMap::idx_t end = MIN2(start + M * BitsPerByte, _bitmap->size());
...
2501 void G1ConcurrentMark::clear_all_live_data(WorkGang* workers) {
...
2506   size_t const num_chunks = align_size_up(_card_live_bm.size_in_words() * HeapWordSize, M) / M;

These two snippets implicitly share knowledge that the chunking size
for tasks is 1Mbyte.  One uses "chunk" and the other uses "task"
terminology, which further obscures this relationship.  Consistent
terminology and a static constant in G1ClearAllLiveDataTask that is
used in both places would help.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1130 class G1LiveDataHelper VALUE_OBJ_CLASS_SPEC {
...
1142   inline void set_card_bitmap_range(BitMap* bm,

This uses non-parallel set_bit and set_range. Parallel callers are
operating on distinct heap regions. If that isn't guaranteed, then two
parallel tasks could try to modify bits in the same BitMap word
simultaneously. We're OK so long as G1HeapRegionSize is a multiple of

  BitsPerWord * (1 << CardTableModRefBs::card_shift)

(which is 2^15 with 512 byte cards on a 64bit machine), but
G1HeapRegionSize is a product option with a relatively unconstrained
value (only bounded between 1M and 32M, no alignment constraint).

Oh, but, it looks like HeapRegion::setup_heap_region_size ensures the
region size is a power of 2 within the min/max bounds, rounding up an
arbitrary command line value if necessary.

So we're OK, though it required some searching to convince myself of
that.  An assertion somewhere in G1LiveDataHelper might be
appropriate, since it doesn't ever use par_xxx bitmap functions,
unlike the old code, which selected between them based on an argument
that no longer exists.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.hpp
 307   // A set bit indicates that the given card contains a live object. 

"... contains a live object." is perhaps imprecise. The card might
contain multiple objects, and might contain parts of one or two
objects that cross either card boundary. Something like "... contains
at least part of at least one live object." would be more precise.
But maybe I'm being overly pedantic.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
 266 class G1ConcurrentMark: public CHeapObj<mtGC> {
...
 280 protected:

[Pre-existing, feel free to ignore.]
Why is all this stuff protected rather than private?

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
 855   // Precondition: obj is below region's NTAMS.

There is no longer an argument called "region"; it's now the region
containing obj. (Actually, it always ways, but there used to also be a
precondition that obj was in region.)

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1OopClosures.inline.hpp
 134     _cm->grayRoot(obj, hr);

With the removal of _worker_i as an argument here, the _worker_i
member of G1RootRegionScanClosure appears to be unused.

------------------------------------------------------------------------------
src/share/vm/utilities/bitMap.cpp
  72   os::pretouch_memory((char*)word_addr(0), (char*)word_addr(size()));

Casts to char* are no longer needed, since fix for JDK-8151414.

Also, copyright needs updating. 

------------------------------------------------------------------------------
src/share/vm/utilities/bitMap.hpp
 147   static idx_t size_in_words(size_t size_in_bits) {

I find the name of this function confusing in conjunction with the
no-arg ordinary member function; I'm not keen on overloads with with
very different semantics and usage. Clearer (to me) here might be
something like calc_size_in_words.

Also, copyright needs updating. 

------------------------------------------------------------------------------



More information about the hotspot-gc-dev mailing list