RFR(S): 8059606: Enable per-method usage of CompileThresholdScaling (per-method compilation thresholds)
zoltan.majo at oracle.com
Tue Jan 20 13:29:32 UTC 2015
thanks for the review!
On 01/19/2015 07:36 PM, Vladimir Kozlov wrote:
> What was changed in globalDefinitions.hpp? Webrev shows nothing
> (sometimes it does not show spacing changes).
Yes, it was some whitespace I've removed around line 1150:
- uintptr_t p = 1;
+ uintptr_t p = 1;
The newest webrev does not show that either. In addition to that, I've
updated two comments in the newest webrev.
> In arguments.cpp, please, add check that CompileThresholdScaling is
> not negative.
I added the check. If CompileThresholdScaling is negative, we set its
value to 1.0 (the default value).
Here is the new webrev: http://cr.openjdk.java.net/~zmajo/8059606/webrev.04/
The webrev link is the same as the one in my latest reply to John.
All JPRT tests pass.
> Otherwise this looks good to me.
> On 1/19/15 8:56 AM, Zoltán Majó wrote:
>> Hi John,
>> On 01/16/2015 08:59 PM, John Rose wrote:
>>> On Jan 16, 2015, at 5:05 AM, Zoltán Majó <zoltan.majo at oracle.com>
>>>> Here is the new webrev:
>>> Reviewed, with some comments.
>> thank you for your review and for the feedback! Please see detailed
>> comments below.
>>> Overall, I like the cleanups along the way.
>>> The basic idea of replacing a hard-coded 'mask' with an addressable
>>> variable is sound and nicely executed.
>>> I suppose that idea by itself is "S" small, but this really is a "M"
>>> or even "L" change, as Vladimir says, especially
>>> since the enhanced logic is spread all around many files.
>> I agree that this is not a small change any more, but I would like to
>> keep the subject of this discussion unchanged so
>> that we don't move to a different thread (unless you or Vladimir
>> think it is better to change it).
>>> How have you regression tested this?
>> I used our performance infrastructure. I collected performance data
>> on 6 different architectures for 41 benchmark
>> programs/suites. The data show that per-method compilation thresholds
>> do not result in a statistically significant
>> change of performance. One benchmark, Footprint3-Client, degrades
>> ~0.5% on the X86 Client VM, but I think that is
>>> Have you verified that the compilation sequence doesn't change for a
>>> non-trivial use case? A slip in the assembly
>>> code (for example) might cause a comparison against a garbage mask
>>> or limit that could cause compilation decisions to
>>> go crazy quietly. I didn't spot any such bug, but they are hard to
>>> spot and sometimes quiet.
>> I did extensive manual testing on all architectures targeted by the
>> I checked that a target method (a method for which per-method
>> compilation thresholds have been specified) is indeed
>> compiled sooner/later than methods for which global compilation
>> thresholds are in effect. I ran tests for all
>> combinations of +/-TieredCompilation, +/-ProfileInterpreter, and
>> on each architecture that we support. While working on this issue
>> I've discovered and reported two interpreter-related
>> bugs, 8068652 and 8068505.
>> I cannot, unfortunately, guarantee that my changes are error-free,
>> but I did my best to catch any possible error that I
>> can think of and that I can check.
>>> In the sparc assembly code (more than one file), the live range of
>>> Rcounters has increased, since it is used to supply
>>> limits as well as to update the counter (which happens early in the
>>> To make it easier to maintain the code, I suggest renaming Rcounters
>>> to G3_method_counters.
>>> (As you can see, when a register has a logical name but has a
>>> complicated live range, we add the hardware name is to
>>> the logical name, to make it easier to spot interfering uses, when
>>> manually editing code.)
>> Thanks for the suggestion, I've changed the register's name.
>>> If scale==0.0 is a valid input checked specially in compileBroker,
>>> perhaps the effect of a zero should be documented?
>>> Suggest adding to globals.hpp:
>>> "; values greater than one delay counter overflow; zero forces
>>> counter overflow immediately"
>>> "; can be set as a per-method option."
>> If CompileThresholdScaling is set to 0.0, all methods are interpreted.
>> The reason is that CompileThreshold==0 has been historically
>> equivalent to setting -Xint to true. Setting
>> CompileThresholdScaling to 0.0 scales down CompileThreshold to 0 and
>> we wanted to keep VM behavior consistent when we've
>> added support for the global CompileThresholdScaling flag in
>> If you think it would be good if we changed the meaning of
>> CompileThreshold==0, please let me know. I'll file an RFE for
>> it and change it. As CompileThreshold is a product flag, we will need
>> CCC approval for that change.
>> As you've suggested, I added a comment to globals.hpp that precisely
>> describes the behavior of CompileThresholdScaling.
>>> Question: What if both a global scale and a method option for scale
>>> are both set? Is the global one ignored? Do
>>> they multiply? It's worth specifying it explicitly (and checking
>>> that the logic DTRT).
>> Global and per-method values multiply. That behavior is now described
>> in the comment in globals.hpp.
>>> Question: How are the log values (like Tier0InvokeNotifyFreqLog or
>>> the result of get_scaled_freq_log) constrained to
>>> fit in a reasonable range? (I would suppose that range is 0..31 or
>>> less.) Should we have range clipping code in the
>>> scaler functions?
>> Currently InvocationCounter::number_of_count_bits=29 bits are
>> reserved for counting. As a result, the value of the log2
>> of the notification frequency can be at most 30. I updated the source
>> code accordingly.
>>> It would give notably simpler code, in MethodData::init, to use a
>>> branch-free setup of tier_0_invoke_notify_freq_log
>>> etc. Set scale = 1.0 and then update it conditionally. Special-case
>>> scale=1.0 in get_scaled_freq_log to taste.
>> I did that.
>>> Same comment about less-branchy code for methodCounters.hpp. (It's
>>> better to have a one-way branch that sets up
>>> 'scale' than a two-way branch with duplicate setups of one or more
>> I changed the code according to your suggestions.
>>> In MethodCounters, I think the conditional scaling of
>>> _interpreter_backward_branch_limit is going to confuse someone,
>>> at some point. It should be scaled, unconditionally, like its
>>> sibling variables. (That would remove another somewhat
>>> verbose initialization branch!)
>> The value of the *global* interpreter backward branch limit
>> (InvocationCounter::InterpreterBackwardBranchLimit) is
>> computed based on the value of CompileThreshold (in
>> InvocationCounter::reinitialize()). The backward branch limit is
>> assigned a different value depending on whether interpreter profiling
>> is enabled or not.
>> The logic of the *per-method* interpreter backward branch limit
>> (MethodCounters::_interpreter_backward_branch_limit) is
>> intended to be identical to that of the global branch limit.
>> Therefore, I'm afraid that we have to keep the if-then-else
>> construct initializing _interpreter_backward_branch_limit in
>> methodCounters.hpp. I hope that I understood right what
>> you've previously suggested.
>>> Small style nit: The noise-word "get_" is discouraged in the style
>>>> • Getter accessor names are noun phrases, with no "get_" noise
>>>> word. Boolean getters can also begin with "is_" or
>>> Arguments.cpp follows this rule partially (in older code maybe?).
>>> It would be better to decrease counter-examples to
>>> the rule instead of increase them.
>>> Bigger style nit: Since the functions are not getting a preset
>>> value (from the arguments) but rather normalizing a
>>> provided argument value, I suggest naming them
>>> "scale_compile_threshold" (i.e., a verb phrase instead of a noun
>>> phrase). Again from the style doc:
>>>> • Other method names are verb phrases, as if commands to the
>> I did not know that, thanks for pointing it out to me. I changed
>> get_scaled_compile_threshold -> scaled_compiled_threshold
>> get_scaled_freq_log -> scaled_freq_log.
>>> Since you are providing overloads of the scaling functions, the
>>> header file should either contain inline code for the
>>> convenience methods, or else document how the optional argument
>>> ('scale') defaults. I'd prefer inline code, since it
>>> is simple. It's as much text to document with a comment as just to
>>> supply the inline overload.
>> I inlined the convenience methods into the header file.
>> Here is the new webrev:
>>> As I said before, nice work!
>> Thank you!
>> Best regards,
>>> — John
More information about the hotspot-compiler-dev