Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled

Christian Thalinger christian.thalinger at
Mon Aug 8 06:56:00 PDT 2011

On Aug 8, 2011, at 12:35 AM, Vladimir Kozlov wrote:

> Christian,
> You need to add big comment to the new code in templateTable_<arch>.cpp explaining what it does and why.

Done.  I made the wording a little more general because Tom's effectively final work might use the same machinery.

> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32  bit)?

Good question.  I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.

_indices in CosntantPoolCacheEntry is defined as intx:

  volatile intx     _indices;  // constant pool index & rewrite bytecodes

and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:

// bit number |31                0|
// bit length |-8--|-8--|---16----|
// --------------------------------
// _indices   [ b2 | b1 |  index  ]

Loading 32-bit on LE gives you the right bits but on BE it does not.  I think that's the reason for the "optimization" on x64.

> Add assert(byte_no == -1, ) to default: case to make sure you got all cases above it.


> I am concern about using next short branch in new code in templateTable_sparc.cpp:
> cmp_and_br_short(..., L_patch_done);  // don't patch
> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.

Yeah, I thought I give it a try if it works.  cmp_and_br_short should assert if the branch displacement is too far, right?

> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.

Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods).  It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it.  I missed that.  But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.

> I don't like assignments in condition and implicit NULL checks. Can you change check_dependency() to next?:
>      klassOop check_dependency() {
>        klassOop result = check_klass_dependency(NULL);
>        if (result != NULL) return result;
>        return check_call_site_dependency(NULL);
>      }


> In interpreterRuntime.cpp initialize marked:  int marked = 0;


> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.

The spec of MutableCallSite says:

"For target values which will be frequently updated, consider using a volatile call site instead."

And VolatileCallSite says:

"A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.

Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.

In other respects, a VolatileCallSite is interchangeable with MutableCallSite."

Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case.  Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.

Additionally I had to do two small changes because the build was broken on some configurations:

-  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
+  klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;


-      MutexLockerEx ccl(CodeCache_lock, thread);
+      MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);

I updated the webrev.

-- Christian

> Vladimir
> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>> 7071653: JSR 292: call site change notification should be pushed not pulled
>> Reviewed-by:
>> Currently every speculatively inlined method handle call site has a
>> guard that compares the current target of the CallSite object to the
>> inlined one.  This per-invocation overhead can be removed if the
>> notification is changed from pulled to pushed (i.e. deoptimization).
>> I had to change the logic in TemplateTable::patch_bytecode to skip
>> bytecode quickening for putfield instructions when the put_code
>> written to the constant pool cache is zero.  This is required so that
>> every execution of a putfield to calls out to
>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>> depending compiled methods.
>> I also had to change the dependency machinery to understand other
>> dependencies than class hierarchy ones.  DepChange got the super-type
>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>> tests and vm.mlvm tests.
>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>> second with 7071653).  Since the CallSite targets don't change during
>> the runtime of this benchmark we can see the performance benefit of
>> eliminating the guard:
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.883000   0.000000   0.883000 (  0.854000)
>>   0.715000   0.000000   0.715000 (  0.715000)
>>   0.712000   0.000000   0.712000 (  0.712000)
>>   0.713000   0.000000   0.713000 (  0.713000)
>>   0.713000   0.000000   0.713000 (  0.712000)
>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>   0.772000   0.000000   0.772000 (  0.742000)
>>   0.624000   0.000000   0.624000 (  0.624000)
>>   0.621000   0.000000   0.621000 (  0.621000)
>>   0.622000   0.000000   0.622000 (  0.622000)
>>   0.622000   0.000000   0.622000 (  0.621000)

More information about the hotspot-compiler-dev mailing list