<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Vladimir,<br>
<br>
thanks again for the detailed review. <br>
<br>
I think it is important to have a small NmethodSweepFraction for
small code cache sizes,<br>
which is provided in the current version. Since we plan to have a
separate sweeper thread, <br>
I would propose the further investigate the impact of
NmethodSweepFraction when we implement<br>
the sweeper thread.<br>
<br>
As for:<br>
<br>
JavaCalls::call_helper()
<br>
<br>
if (CompilationPolicy::must_be_compiled(method)) {
<br>
CompileBroker::compile_method(method, InvocationEntryBci,
<br>
<br>
CompilationPolicy::policy()->initial_compile_level(),
<br>
methodHandle(), 0,
"must_be_compiled", CHECK);
<br>
+ } else if (UseCodeCacheFlushing) {
<br>
+ nmethod* saved = CodeCache::reanimate_saved_code(method());
<br>
+ if (saved != NULL) {
<br>
+ method->set_code(method, saved);
<br>
+ }
<br>
}
<br>
<br>
The current places where code is re-animated
(Method::verified_code_entry() and CompileBroker::compile_method()
seem effective. <br>
<br>
One important note: The version of webrev.01 computes the
following parameter incorrectly:<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<pre style="color: rgb(0, 0, 0); font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="new" style="color: blue; font-weight: normal;">const double flushing_fraction = 1 - (pow((ReservedCodeCacheSize / M), -1/3) + CodeCacheFlushingMinimumPercentage / 100);
</span></pre>
<br>
As a result, flushing fraction was always 0, which means that no
memory was speculatively disconnected.<br>
This is a strong indicator that the hotness counter renders the
speculatively disconnect + reanimate obsolete.<br>
I'll work on a new version where this code is removed. <br>
<br>
Best,<br>
Albert<br>
<br>
<br>
On 04.09.2013 22:33, Vladimir Kozlov wrote:<br>
</div>
<blockquote cite="mid:52279909.7000708@oracle.com" type="cite">On
9/4/13 4:07 AM, Albert Noll wrote:
<br>
<blockquote type="cite">Hi,
<br>
<br>
here is an updated version of the patch. In summary, changes to
the
<br>
previous version are:
<br>
<br>
1) Adapt NmethodSweepFraction based on the code cache size:
<br>
Before the patch, NmethodSweepFraction was defined as a
constant value.
<br>
I.e., NmethodSweepFraction was 16 for a code cache size of
32m as
<br>
well as a
<br>
code cache size of 256m. This seems unreasonable, since
<br>
NmethodSweepFraction
<br>
was originally introduce to keep the time spent in the
sweeper
<br>
(note that sweeping
<br>
was originally done at safepoints) at a reasonable level.
<br>
</blockquote>
<br>
Albert,
<br>
<br>
General question first. Do we still need to sweep in fractions
since we don't do sweeping at safepoint and it stops for
safepoints? The only drawback is one Compiler thread is used for
sweeping and can't compile when it sweeps. But we will address
that later by using special sweeping thread.
<br>
<br>
<blockquote type="cite"> The patch adapts NmethodSweepFraction
to the ReservedCodeCacheSize.
<br>
I.e.,
<br>
each sweep operation roughly covers 16m of the code cache.
<br>
</blockquote>
<br>
Why you stop at 16?:
<br>
<br>
+ } else if (ReservedCodeCacheSize > (256 * M)) {
<br>
+ FLAG_SET_DEFAULT(NmethodSweepFraction, 16);
<br>
<br>
ReservedCodeCacheSize could be 2GB so one sweep have to process
128Mb.
<br>
Next should cover all cases, I think:
<br>
<br>
FLAG_SET_DEFAULT(NmethodSweepFraction, 1 + ReservedCodeCacheSize /
(16 * M))
<br>
<br>
<blockquote type="cite">
<br>
2) The parameter NmethodSweepFraction is replaced by a function
that
<br>
computes the
<br>
NmethodSweepFraction based on the code cache size. See
<br>
'sweeper.cpp' for a detailed
<br>
description
<br>
<br>
3) The initial hotness counter depends on the code cache size.
More
<br>
specifically, the initial
<br>
hotness counter is 2*ReservedCodeCacheSize. As a result, a
method
<br>
stays longer in the
<br>
code cache if ReservedCodeCacheSize is larger.
<br>
</blockquote>
<br>
I don't like spreading '(ReservedCodeCacheSize / M) * 2' through
the code. Use NMethodSweeper::get_hotness_counter_reset_val() in
nmethod constuctors.
<br>
<br>
Also you need to check for min value if ReservedCodeCacheSize <
1Mb.
<br>
<br>
Also drop 'get_' from name. We usually don't use 'get_' in
accessor methods. the same for nmethod::get_hotness_counter().
<br>
<br>
I already asked you to make hotness_counter_decay member of
nmethod so that you don't need to pass it to
dec_hotness_counter(). Or remove it at all since it is always 1:
<br>
<br>
void dec_hotness_counter() { _hotness_counter--; }
<br>
<br>
I would prefer all code which modifies _hotness_counter be in
nmethod class.
<br>
<br>
<blockquote type="cite">
<br>
4) Newly compiled methods are guaranteed to not be evicted by
the
<br>
sweeper for 10
<br>
sweep cycles. This ensures the newly compiled methods are
not
<br>
immediately made
<br>
not entrant after compialtion.
<br>
<br>
5) The hotness counter is reset EVERY TIME active methods are
scanned.
<br>
In the previous version
<br>
the hotness counter was only reset after a full sweep cycle
of the
<br>
code cache. Resetting the
<br>
hotness counter more frequently provides a better hotness
coverage
<br>
of methods.
<br>
<br>
6) Methods are flushed in blocks of 1m. The algorithm computes
the
<br>
average hotness of a
<br>
a nmethod block and evicts the entire block. This should
reduce
<br>
fragmentation.
<br>
<br>
Please let me know what you think about these changes.
Performance
<br>
results can be found at:
<br>
<a class="moz-txt-link-freetext" href="https://bugs.openjdk.java.net/browse/JDK-8020151">https://bugs.openjdk.java.net/browse/JDK-8020151</a>
<br>
I will continuously provide more results.
<br>
<br>
Here is the new webrev:
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~anoll/8020151/webrev.01/">http://cr.openjdk.java.net/~anoll/8020151/webrev.01/</a>
<br>
<a class="moz-txt-link-rfc2396E" href="http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.01/"><http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.01/></a>
<br>
</blockquote>
<br>
Very nice comment in sweeper.hpp. Could you please extend it?
<br>
<br>
2) sweep nmethods
<br>
<br>
Sweeping is currently done by Compiler thread between compilations
or at least each 5 sec (NmethodSweepCheckInterval) when CodeCache
is full.
<br>
<br>
The comment is not cleare that nmethods could be marked as
not-entrant by different code (deoptimization, dependency
invalidation, replace old nmethod) and not just by sweeper (which
does that only for flushed nmethods).
<br>
<br>
Each nmethod's state change happens during separate sweeps. It may
take at least 3 sweeps before nmethod's space is freed.
<br>
<br>
3) code cache flushing
<br>
... as a VM operation at safepoint.
<br>
<br>
Flashing VM operation requires safepoint.
<br>
<br>
As you pointed current code re-animates nmethods only if we get
compilation request. It means that it depends on profiling
counters. It could be the case that a method have to be executed
in interpreter for some time again to hit compilation threshold.
<br>
<br>
I think we need to re-animate nmethod immediately during java
call:
<br>
<br>
JavaCalls::call_helper()
<br>
<br>
if (CompilationPolicy::must_be_compiled(method)) {
<br>
CompileBroker::compile_method(method, InvocationEntryBci,
<br>
<br>
CompilationPolicy::policy()->initial_compile_level(),
<br>
methodHandle(), 0,
"must_be_compiled", CHECK);
<br>
+ } else if (UseCodeCacheFlushing) {
<br>
+ nmethod* saved = CodeCache::reanimate_saved_code(method());
<br>
+ if (saved != NULL) {
<br>
+ method->set_code(method, saved);
<br>
+ }
<br>
}
<br>
<br>
Can you verify how this work?
<br>
<br>
I will send additional comments when look through the rest of
code.
<br>
<br>
Thanks,
<br>
Vladimir
<br>
<br>
<blockquote type="cite">
<br>
Best,
<br>
Albert
<br>
<br>
<br>
On 22.08.2013 15:16, Albert Noll wrote:
<br>
<blockquote type="cite">Hi Igor,
<br>
<br>
thanks again for your comments. You are right. I will run some
<br>
benchmarks to see if removing
<br>
the disconnect logic is feasible or not.
<br>
<br>
Best,
<br>
Albert
<br>
<br>
On 22.08.2013 12:39, Igor Veresov wrote:
<br>
<blockquote type="cite">It's worth a try. But the "hotness"
logic is probabilistic and
<br>
imprecise - it will notice only methods that are on stack
during
<br>
safepoints. Those are going to be pretty narrow snapshots of
<br>
activity. I suspect that for large flat profiles (like
enterprise
<br>
apps and friends) you could be missing methods that are
rather warm
<br>
in reality, which will cause recompilation oscillations.
Although if
<br>
the statistics are allowed to accumulate enough may be it's
going to
<br>
work out, an experiment will tell. The patch is a good
start, the
<br>
stack sampling, IMO, is totally the right approach for
filtering out
<br>
the hot methods.
<br>
<br>
The "disconnect" logic on the other hand is sort of precise.
<br>
Although, looking now at the code it's not quite clear to me
how it
<br>
works, it doesn't seem to be any patching going on to divert
the
<br>
control for the cases when the nmethod is called directly or
through
<br>
an IC. So I guess it's not really a full disconnect?
Anyways, in
<br>
theory, with some work, we can make the disconnect logic to
precisely
<br>
measure the time the method is inactive. Which should
provide precise
<br>
information about the warm/cold methods.
<br>
<br>
Btw, also just noticed a bunch of flaws in the interaction
of the
<br>
disconnect logic and tiered. The nmethod's "reconnection"
happens in
<br>
CompileBroker::compile_method(), which firstly will be
called by
<br>
tiered only after a rather substantial number of invocations
in the
<br>
interpreter (up to 128), and secondly will be subject to all
the
<br>
prioritization rules (it probably should not), and also we
don't
<br>
check if the comp level of the reanimated nmethod matches
the
<br>
request. If the disconnect logic is to stay, the interpreter
should
<br>
be able to know if the method has saved code and be able to
call into
<br>
the runtime immediately to reanimate it.
<br>
<br>
igor
<br>
<br>
On Aug 21, 2013, at 10:27 PM, Albert Noll
<<a class="moz-txt-link-abbreviated" href="mailto:albert.noll@oracle.com">albert.noll@oracle.com</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:albert.noll@oracle.com"><mailto:albert.noll@oracle.com></a>> wrote:
<br>
<br>
<blockquote type="cite">Hi Igor,
<br>
<br>
thanks for looking at the patch. Actually, I think - just
as
<br>
Vladimir pointed out - that we can get rid of
<br>
the "disconnect" logic. We now have the hotness of a
method and if
<br>
the code cache fills up, we and
<br>
we decide to schedule the method for removal, we set it to
not_entrant.
<br>
It seems that adding the method to the list of
disconnected methods
<br>
just buys a little more time until we decide to make the
method
<br>
not-entrant. However, we can have the same effect by
setting the
<br>
threshold differently.
<br>
<br>
What do you think?
<br>
<br>
Best,
<br>
Albert
<br>
<br>
On 22.08.2013 10:02, Igor Veresov wrote:
<br>
<blockquote type="cite">May be instead of "(_traversals
> _last_flush_traversal_id + 2)" we
<br>
should timestamp a method when it's disconnected, and
then use a
<br>
rule like if a method has been disconnected for k *
<br>
reverse_free_ratio() seconds then it's ok to kill it. We
can also
<br>
sort the nmethods that pass that filter by the amount of
time they
<br>
were disconnected and select most likely candidates for
flushing.
<br>
This should allow to basically do disconnect/flush in
every
<br>
traversal, which should make things faster. Timestamps
would be
<br>
obtained only once per traversal or something like that.
What do
<br>
you think?
<br>
<br>
Pretty cool idea to reverse-prioritize disconnects on
hotness.
<br>
<br>
igor
<br>
<br>
On Aug 21, 2013, at 4:42 AM, Albert Noll
<<a class="moz-txt-link-abbreviated" href="mailto:albert.noll@oracle.com">albert.noll@oracle.com</a>
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:albert.noll@oracle.com"><mailto:albert.noll@oracle.com></a>> wrote:
<br>
<br>
<blockquote type="cite">Hi all,
<br>
<br>
could I have reviews for this patch? Please note
<br>
that I do not yet feel very confident with the
sweeper,
<br>
so please take a close look.
<br>
<br>
jbs: <a class="moz-txt-link-freetext" href="https://jbs.oracle.com/bugs/browse/JDK-8020151">https://jbs.oracle.com/bugs/browse/JDK-8020151</a>
<br>
webrev:
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~anoll/8020151/webrev.00/">http://cr.openjdk.java.net/~anoll/8020151/webrev.00/</a>
<br>
<a class="moz-txt-link-rfc2396E" href="http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/"><http://cr.openjdk.java.net/%7Eanoll/8020151/webrev.00/></a>
<br>
<br>
<br>
Many thanks in advance,
<br>
Albert
<br>
<br>
<br>
Problem: There can be large performance regressions
when the code
<br>
cache fills up. There are
<br>
several reasons for the performance regression: First
(1), when
<br>
the code cache is full and methods
<br>
are speculatively disconnected, the oldest methods
(based on
<br>
compilation ID) are scheduled for
<br>
flushing. This can result in flushing hot methods.
Second (2),
<br>
when compilation is disabled due to a full
<br>
code cache, the number of sweeps can go down. A lower
number of
<br>
sweep operations results
<br>
in slower method flushing.
<br>
<br>
Solution:
<br>
Introduce a hotness counter that is set to a
particular value
<br>
(e.g., 100) when there is an activation
<br>
of the method during stack scanning. The counter is
decremented by
<br>
1 every time the sweeper
<br>
is invoked.
<br>
<br>
ad (1):
<br>
A VM operation that speculatively disconnects
nmethods, selects
<br>
the methods that should be
<br>
flushed based on the hotness. For example, if 50% of
the code
<br>
cache shall be flushed, we flush
<br>
those methods that have not been active while stack
scanning for
<br>
the longest time. Note that
<br>
while this strategy is more likely to flush cold
methods, it is
<br>
not clear to what extent the new
<br>
strategy fragments the code cache.
<br>
<br>
Changes in
NMethodSweeper::speculative_disconnect_nmethods(bool
<br>
is_full)
<br>
<br>
ad (2)
<br>
Currently, methods are removed from the code cache
if:
<br>
a) code cache is full
<br>
b) class is unloaded
<br>
c) method is replaced by another version (i.e.,
compiled with
<br>
a different tier)
<br>
d) deopt
<br>
<br>
The current patch adds a 5-th possibility to remove
a method
<br>
from the code cache.
<br>
In particular, if a method has not been active
during stack
<br>
scanning for a long-enough
<br>
amount of time, the method is removed from the code
cache. The
<br>
amount of time
<br>
required to flush the method depends on the
available space in
<br>
the code cache.
<br>
<br>
Here is one example: If a method was seen on a
stack the
<br>
hotness counter
<br>
is set to 100. A sweep operation takes roughly
place every
<br>
100ms. I.e., it takes
<br>
100ms * 100 = 10s until the hotness counter reaches
0. The
<br>
threshold that determines
<br>
if a method should be removed from the code cache
is calculated
<br>
as follows:
<br>
<br>
threshold = -100 + (CodeCache::reverse_free_ratio()
*
<br>
NMethodSweepActivity)
<br>
<br>
For example, if 25% of the code cache is free,
<br>
reverse_free_ratio returns 4.
<br>
The default value of NMethodSweepActivity is 10.
As a result,
<br>
threshold = -60.
<br>
Consequently, all methods that have a hotness
value smaller
<br>
than -60 (which
<br>
means they have not been seen on the stack for
16s) are
<br>
scheduled to be flushed
<br>
from the code cache. See an illustration of the
threshold as a
<br>
function of the available
<br>
code cache in threshold.pdf
<br>
<br>
Note that NMethodSweepActivity is a parameter that
can be
<br>
specified via a -XX
<br>
flag.
<br>
<br>
Changes in NMethodSweeper::sweep_code_cache()
<br>
<br>
<br>
A very preliminary performance evaluation looks
promising. I used
<br>
the DaCapo
<br>
benchmarks where a series of benchmarks is executed in
the same VM
<br>
instance.
<br>
See performance.pdf . The x-axis shows the benchmarks.
Assume we
<br>
have 2 benchmarks
<br>
(BM). The execution sequence is as follows:
<br>
<br>
BM1 (Run 1-1)
<br>
BM1 (Run 2-1)
<br>
BM2 (Run 1-1)
<br>
BM2 (Run 2-1)
<br>
<br>
BM1 (Run 1-2)
<br>
BM1 (Run 2-2)
<br>
BM2 (Run 1-2)
<br>
BM2 (Run 2-2)
<br>
<br>
<br>
A value larger than 0 on the x-axis indicates that the
version
<br>
including the proposed patch is faster.
<br>
I.e., the values are calculated as follows:
(T_original /
<br>
T_with_patch) - 1. T is the execution time
<br>
(wall clock time) of the benchmark.
ReservedCodeCacheSize is set
<br>
to 50m. I used three runs and
<br>
the arithmetic average to compare the numbers. I know,
we need
<br>
much more data, however,
<br>
I think we can see a trend.
<br>
<br>
The current patch does not trigger a warning that the
code cache
<br>
is full and compilation has been
<br>
disabled.
<br>
<br>
Please let me know that you think.
<br>
<threshold.pdf><performance.pdf>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
</blockquote>
<br>
</body>
</html>