RFR (M): 8077144: Concurrent mark initialization takes too long
thomas.schatzl at oracle.com
Fri Mar 4 09:52:02 UTC 2016
can I have reviews for this change that brings G1 start-up time close
to Parallel-GCs? That is, ~10-15% difference on large machines
depending on the size of the heap and number of marking threads used,
on small machines it will be on par.
So the cause for the slow-down is the initialization of the per-marking
thread data structures: depending on the size of the heap and the
number of marking threads this can be a substantial amount of memory
that the OS needs to get memory for.
We have seen 15mins of additional startup time on really big machines.
This change modifies the allocation to always use virtual memory for
these large bitmaps, avoiding the overhead of the OS backing the
Memory for this data structure will be backed on demand during the
first few marking cycles. This is often preferable to the long wait, as
the adaptive marking can cope with that. In case this is required after
all, there is a new experimental option G1PreTouchMarkBitmaps (default:
false) that tries to squeeze out a bit more performance by
parallelizing the pre-touching.
The reason for not enabling pre-touching by default (and also not
enabling it with AlwaysPreTouch) is the excessive time cost (starting
with e.g. 60 marking threads it doubles pre-touching time), the low
impact on general behavior, and the flexibility. Also, actually in many
cases the marking will never touch large parts of the heap (the young
gen), and not all threads are expected to touch all remaining memory,
effectively saving even more memory.
Due to time constraints, decreasing the memory usage has been out of
scope. There is JDK-8151215 which contains some initial ideas.
Also, there is some follow-up changes to clean up and improve related
code (JDK-8151069 and JDK-8151125).
jprt, vm.gc testlist
More information about the hotspot-gc-dev