Request for reviews (M): 6614597: Performance variability in jvm2008 xml.validation

Vladimir Kozlov Vladimir.Kozlov at Sun.COM
Fri Jan 22 20:15:39 PST 2010


I agree with you that we need to cleanup deopt reasons table but it is for an other bug
unless you want me to fix it in these changes.

After spending today on investigating the bimorphic problem I found next.
The reason we always recompile the call site as bimorphic is not atomic increment
of MDO counters and incorrect morphism set in such case.


uncommon traps in bci 49:

49 invokevirtual 50 <getEnclosingType>; <()Lcom/sun/tools/javac/code/Type;>
   240 bci: 49   VirtualCallData     trap(class_check recompiled) flags(192) count(20871188) entries(2)

As you see, total counter (20871188) < (rec0_count (18296976) + rec1_count (3378474)).
In such case iMethod::call_profile_at_bci(int bci) will incorrectly
set morphism to 2 and replaces total counter with the sum of rec0_count+rec1_count
so that counters numbers in LogCompilation looks fine:

<call method='697' count='21617491' prof_factor='1' virtual='1' inline='1' receiver='667' receiver_count='18249149' receiver2='678' receiver2_count='3368342'/>

Here is call_profile_at_bci() code:

         // Determine call site's morphism.
         // The call site count could be == (receivers_count_total + 1)
         // not only in the case of a polymorphic call but also in the case
         // when a method data snapshot is taken after the site count was updated
         // but before receivers counters were updated.
         if (morphism == result._limit) {
            // There were no array klasses and morphism <= MorphismLimit.
            if (morphism <  ciCallProfile::MorphismLimit ||
                morphism == ciCallProfile::MorphismLimit &&
                (receivers_count_total+1) >= count) {  <<<<<<<<<<<!!!!! this causing the problem
              result._morphism = morphism;
         // Make the count consistent if this is a call profile. If count is
         // zero or less, presume that this is a typecheck profile and
         // do nothing.  Otherwise, increase count to be the sum of all
         // receiver's counts.
         if (count > 0) {
           if (count < receivers_count_total) {
             count = receivers_count_total;
       result._count = count;

I did it because I never imaging that total counter will be less then the sum.

So the fix, I think, should be next to allow receivers_count_total be only
in the range [count-1, count+1] for bimorphic case:

                morphism == ciCallProfile::MorphismLimit &&
-              (receivers_count_total+1) >= count) {
+              (count <= receivers_count_total+1 && receivers_count_total <= count+1)) {
              result._morphism = morphism;

And we don't need Reason_bimorphic with this fix.


Tom Rodriguez wrote:
> But if I understand reason_recorded_per_bytecode_if_any correctly, its purpose is to use a counter which is recorded per bci for one that isn't.  The same bytecode couldn't have both a div0 and null check so it's use the null check count for the div0 count.  If you convert Reason_bimorphic to Reason_class_check then it's like you don't a Reason_bimorphic counter anymore since you'll be reading the Reason_class_check counter and we're back where we started.  The compiler needs to be able to distinguish failure of the mono case from failure of the bi case.  If you want to smear it to a different counter then why not use a counter which isn't used at call sites at all like Reason_range_check?
> Actually I think Reason_RECORDED_LIMIT is computed wrong.  Reason_unloaded is currently given a slot in _trap_hist._array because of this requirement:
> assert(DS_REASON_MASK >= Reason_RECORDED_LIMIT, "enough bits");
>     Reason_RECORDED_LIMIT = Reason_unloaded   // some are not recorded per bc                                                                               
> DS_REASON_MASK is 7 and Reason_unloaded is currently 7 but why would we want to record Reason_unloaded per_bci?  Why isn't Reason_RECORDED_LIMIT = Reason_intrinsic?  For that matter why can't we does Reason_none get a counter?  If we corrected the limits of the per bci counter then we could really give Reason_bimorphic it's own per bci counter.
> tom
> On Jan 22, 2010, at 3:19 PM, Vladimir Kozlov wrote:
>> Tom,
>> John does not suggest to remove Reason_bimorphic. He suggests next:
>> src/share/vm/runtime/deoptimization.hpp
>>  static DeoptReason reason_recorded_per_bytecode_if_any(DeoptReason reason) {
>>    if (reason_is_recorded_per_bytecode(reason))
>>      return reason;
>>    else if (reason == Reason_div0_check) // null check due to divide-by-zero?
>>      return Reason_null_check;           // recorded per BCI as a null check
>> +   else if (reason == Reason_bimorphic)  // Piggy back Reason_bimorphic on the
>> +     return Reason_class_check;          // Reason_class_check per bytecode count.
>>    else
>>      return Reason_none;
>>  }
>> Which will allow only to remove the next new code in deoptimization.cpp:
>> +        if (reason == Reason_bimorphic) {
>> +          // This isn't recorded per bci because of MDO limitations
>> +          // but lets piggy back the invalidation on the
>> +          // Reason_class_check count.
>> +          uint prior_trap_count = trap_mdo->trap_count(Reason_bimorphic);
>> +          if (prior_trap_count >= (uint)PerBytecodeTrapLimit) {
>> +            make_not_entrant = true;
>> +          }
>> +        } else {
>> But since reason_recorded_per_bytecode_if_any() is used in other places I am looking if it is OK there also.
>> Vladimir
>> John Rose wrote:
>>> On Jan 22, 2010, at 2:59 PM, Tom Rodriguez wrote:
>>>> I don't see how this will work right.  The whole point of introducing a new reason was to get a separate counter to distinguish between the monomorphic and bimorphic predicted call cases.  Otherwise you can never make the mono to bi transition.  Maybe I don't understand what changing reason_recorded_per_bytecode_if_any in this way would do.  I'd prefer to have a per bytecode counter for bimorphic but we didn't have any left.
>>> It's a little better than the previous code, because at least the state changes are distinct now, because of the reasons.
>>> There is also a small distinction between counter states.  If the per-nmethod counter for Reason_bimorphic is zero, then the per-bytecode count is disregarded, even though it is non-zero due to aliased Reason_class_check failures.  Note that maybe_prior_trap returned by query_update_method_data will be false even though the per-bytecode count (which is a single bit) is non-zero.
>>> This allows one free bimorphism deopt per nmethod.  If this isn't enough, maybe we can downgrade one of the other one-bit counts to a per-nmethod count.
>>> -- John

More information about the hotspot-compiler-dev mailing list