[9] RFR(M) 8041984: CompilerThread seems to occupy all CPU in a very rare situation

David Chase david.r.chase at oracle.com
Tue Oct 14 17:30:39 UTC 2014

```I’m curious about the granularity of the completion-prediction —
you’re checking every 4 iterations and extrapolating from there.
Do we know that the times are that uniform, or would we be better off with larger sample sizes?
(How long does a single iteration take, anyway?)

And in the sample code, I see the choice of sample size (4) encoded as
(next & 3) == 0
// Each 4 iterations calculate how much time it will take
double time_per_iter = (stop_time - start_time) * 0.25;

why not write this as (say)

static const int LOG_SAMPLE_SIZE = 2;
static const int SAMPLE_SIZE = (1 << LOG_SAMPLE_SIZE);
next % SAMPLE_SIZE == 0   // next could be a uint, couldn’t it?
double time_per_iter = (stop_time - start_time) * (1.0/SAMPLE_SIZE);

perhaps rewriting 1.0/SAMPLE_SIZE as another static const as necessary
to get it properly precalculated; I’m not sure what C++ compilers are doing nowadays.

An alternate approach would be to compute the overall rate thus far, checking at
(say) iterations 4, 8, 16, 32, etc and computing the average from there, starting
at whatever power of two is a decent fraction of the iterations needed to timeout.

e.g., don’t reset start_time, just compute elapsed at each sample:

if ((next & (ptwo - 1) == 0) {
time.stop();
double elapsed = time.seconds() - start_time;
double projected = elapsed * (double) java_objects_length / next;

Or you can compute a java_objects_length that would surely lead to failure, and
always do that (fast, integer) comparison:
...
unsigned int length_bound = (int) next * (CG_BUILD_TIME_LIMIT / elapsed)
…
if (java_objects_length > length_bound) { … timeout

David

> https://bugs.openjdk.java.net/browse/JDK-8041984
> http://cr.openjdk.java.net/~kvn/8041984/webrev/
>
> Next method puts C2's EA to its knees:
>
> com.graphhopper.coll.GHLongIntBTree\$BTreeEntry::put()
>
> https://github.com/karussell/graphhopper/blob/master/core/src/main/java/com/graphhopper/coll/GHLongIntBTree.java
>
> It shows that current EA connection graph build code is very inefficient when a graph has a lot of connections (edges).
>
> With inlining the BTreeEntry::put() method has about 100 allocations from which about 80 are arrays allocations. Most of them are object arrays which are generated by Arrays.copyOf(). After number of edges in EA Connection Graph reaches about 50000 it takes more then 1 sec to find all nodes which reference an allocation.
>
> Current EA code has bailout by timeout from graph building code. But the timeout (30 sec for product VM) is checked only after all object nodes are processed. So it may take > 2 mins before EA is aborted:
>
> <phase name='connectionGraph' nodes='18787' live='10038' stamp='4.417'>
> <connectionGraph_bailout reason='reached time limit'/>
> <phase_done name='connectionGraph' nodes='18787' live='10038' stamp='150.967'/>
>
> Also ConnectionGraph::add_java_object_edges() method does not check the presence of a node when it adds new one to the worklist (in add_to_worklist() method). Which leads to hundreds MB of used by EA memory when BTreeEntry::put() is processed.
>
> New _pidx index field is added to graph's nodes and VectorSet is used to reduce size of worklist in add_java_object_edges() method.
>
> Escaping node CreateEx is mapped phantom_obj to reduce number of process objects in connection graph. We do the same for other escaping objects.
>
> Tested with graphhopper server application:
>
> <phase name='connectionGraph' nodes='18787' live='10038' stamp='6.829'>
> <connectionGraph_bailout reason='reached time limit'/>
> <phase_done name='connectionGraph' nodes='18787' live='10038' stamp='22.355'/>
>
> Thanks,