RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

David Holmes david.holmes at oracle.com
Thu Jun 18 02:38:25 UTC 2020

Hi Ioi,

On 17/06/2020 1:19 pm, Ioi Lam wrote:
> On 6/16/20 6:20 PM, David Holmes wrote:
>> Hi Ioi,
>> On 17/06/2020 6:14 am, Ioi Lam wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8246019
>>> http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/ 
>>> PerfClassTraceTime is (a rarely used feature) for measuring the time 
>>> spent during class linking and initialization.
>> "A special command jcmd <process id/main class> PerfCounter.print 
>> prints all performance counters in the process."
>> How do you know this is a "rarely used feature"?
> Hi David,
> Sure, the counter will be dumped, but by "rarely used" -- I mean no one 
> will find this particular counter useful, and no one will be actively 
> looking at it.
> I changed two parts of the code -- class init and class linking.
> For class initialization, the counter may be useful for people who want 
> to know how much time is spent in their <clinit> functions, and my patch 
> doesn't change that. It only avoids using the counter when a class has 
> no <clinit>, i.e., we know that the counter counts nothing (except for a 
> logging statement).

I can see where you are coming from here. Today we keep track of the 
time taken to mark a class as initialized when there is no clinit to 
execute, and we actually record pure timer overhead as it dwarfs the 
simple update to the class state. With your change we now won't track 
the time taken to mark the class as initialized. In both cases the time 
recorded is inaccurate - in opposite senses. In that regard your slight 
underestimation of the class initalization cost seems better then the 
present over-estimate.

> =====
> For class linking, no user code is executed, so it only measures VM 
> code. If it's useful for anyone, that would be VM engineers like me who 
> are trying to optimize the speed of class loading. However, due to the 
> overhead of the counter vs what it's trying to measure, the results are 
> pretty meaningless.
> Note that I've not disabled the counter altogether. Instead, I disable 
> it only when linking a CDS shared class, and we know that very little is 
> happening for this class (e.g., no verification).

Yes I have little concern for this part. Linking is a multi-phase 
process so "time for linking" is already ill-defined. And the fact you 
only do it for CDS makes it even less interesting.

> I think the class linking timer might have been useful 15 years ago when 
> it was introduced, or it might be useful today when CDS is disabled. But 
> with CDS enabled, we are paying a constant price that seems to benefit 
> no one.
> I think we should short-circuit it when it seems appropriate. If this 
> indeed causes problems for our users, it's easy to re-enable it. That's 
> better than just keeping this forever just because we're afraid to touch 
> anything.

I'm uncomfortable with both the "keep forever as we're too afraid to 
change" and "change it now and restore it if anyone complains" ends of 
this spectrum. Obviously we need to make progress, but the "change it 
now and change back later if needed" is a bit naive, as once any change 
is made we can't change back without affecting another set of users, and 
we don't know how long it will be before the change reaches users and 
the problems return to us. From a CSR perspective I want to see that due 
diligence has been applied with regard to these behavioural changes, and 
JDK engineers are often not in a position to understand how end users 
use this kind of information. I don't have a solution for that general 

In this particular case I think under-estimating the class 
initialization overhead is better than the present over-estimate. Though 
anyone tracking the trends here will be surprised when the cost suddenly 

>> I find it hard to evaluate whether this short-circuiting of the time 
>> tracing is reasonable or not. Obviously any monitoring mechanism 
>> should impose minimal overhead compared to what is being measured, and 
>> these timers fall short in that regard. But if these stats become 
>> meaningless then they may as well be removed.
>> I think the serviceability folk (cc'd) need to evaluate this in the 
>> context of the M&M tools.
>>> However, it's quite expensive and it needs to start and stop a bunch 
>>> of timers. With CDS, it's quite often for the overhead of the timer 
>>> itself to be much more than the time it's trying to measure, giving 
>>> unreliable measurement.
>>> In this patch, when it's clear that the init and linking will be very 
>>> quick, I disable the timer and count only the number of invocations. 
>>> This shows a small improvement in start-up
>> I'm curious if you tried to forcing EagerInitialization to be true to 
>> see how that improves the baseline. I've always noticed eager_init in 
>> the code, but hadn't realized it is disabled by default.
> I think it cannot be done by default, as it will violate the JLS. A 
> class can be initialized only when it's touched by bytecodes.

I'm also not sure it violates JLS as you can't directly query if a class 
is initialized, but anyway I wasn't suggesting turning this on by 
default, I meant only in regard to get a performance baseline to compare 
against the changes you have implemented here.

Aside: I have to wonder whether anyone uses EagerInitialization or 
whether we should get rid of it?

> It can also backfire as we may load many classes without initializing 
> them. E.g., during bytecode verification, we load many classes and just 
> check that one is a supertype of another.

Not sure what is backfiring here ??


> Thanks
> - Ioi
>> Cheers,
>> David
>> -----
>>> Results of " perf stat -r 100 bin/java -Xshare:on 
>>> -XX:SharedArchiveFile=jdk2.jsa -Xint -version "
>>> 59623970 59341935 (-282035)   -----  41.774  41.591 ( -0.183) -
>>> 59623495 59331646 (-291849)   -----  41.696  41.165 ( -0.531) --
>>> 59627148 59329526 (-297622)   -----  41.249  41.094 ( -0.155) -
>>> 59612439 59340760 (-271679)   ----   41.773  40.657 ( -1.116) -----
>>> 59626438 59335681 (-290757)   -----  41.683  40.901 ( -0.782) ----
>>> 59618436 59338953 (-279483)   -----  41.861  41.249 ( -0.612) ---
>>> 59608782 59340173 (-268609)   ----   41.198  41.508 (  0.310) +
>>> 59614612 59325177 (-289435)   -----  41.397  41.738 (  0.341) ++
>>> 59615905 59344006 (-271899)   ----   41.921  40.969 ( -0.952) ----
>>> 59635867 59333147 (-302720)   -----  41.491  40.836 ( -0.655) ---
>>> ================================================
>>> 59620708 59336100 (-284608)   -----  41.604  41.169 ( -0.434) --
>>> instruction delta =      -284608    -0.4774%
>>> time        delta =       -0.434 ms -1.0435%
>>> The number of PerfClassTraceTime's used is reduced from 564 to 116 
>>> (so we have an overhead of about 715 instructions per use, yikes!).

More information about the hotspot-runtime-dev mailing list