<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I think the problem could be that make_not_entrant() increases decompile count unless update_recompile_counts() == false. So when the sweeper flushes that cause the increments of the counter and we eventually give up recompiling it. Perhaps make_not_entrant() needs a flag to not increment the counts.<div class=""><br class=""></div><div class="">igor<div class=""><div class=""> <div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 7, 2016, at 9:57 AM, James Lent <<a href="mailto:jlent@tivo.com" class="">jlent@tivo.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
I have not been able to collect much more information on this issue. We have been running in production using the following JVM options with good results for over 2 weeks on one node and over 1 week on the other node:<br class="">
<br class="">
-XX:-TieredCompilation <br class="">
-XX:ReservedCodeCacheSize=240m<br class="">
<br class="">
Here is some production data. I realize it can be problematic to compare results between nodes and across days of the week, but, all the results have been very consistent and we have gone back and forth between the settings several times:<br class="">
<br class="">
<table height="79" width="1080" border="1" cellpadding="2" cellspacing="2" class="">
<tbody class="">
<tr class="">
<td valign="top" align="center" class="">Node<br class="">
</td>
<td valign="top" align="center" class="">Tiered<br class="">
</td>
<td valign="top" class="">Date<br class="">
</td>
<td valign="top" align="center" class="">Average Response Time (24 Hr)<br class="">
</td>
<td valign="top" align="center" class="">Trans/Sec (24 Hr)<br class="">
</td>
<td valign="top" align="center" class="">Processor Load (24 Hr)<br class="">
</td>
<td valign="top" align="center" class="">Expensive Test Query<br class="">
</td>
</tr>
<tr class="">
<td valign="top" class="">Flex 0<br class="">
</td>
<td valign="top" class="">Disabled<br class="">
</td>
<td valign="top" class="">6/24/16<br class="">
</td>
<td valign="top" align="right" class="">387 ms<br class="">
</td>
<td valign="top" align="right" class="">6.33<br class="">
</td>
<td valign="top" align="right" class="">2.31<br class="">
</td>
<td valign="top" align="right" class="">1899 ms<br class="">
</td>
</tr>
<tr class="">
<td valign="top" class="">Flex 1<br class="">
</td>
<td valign="top" class="">Enabled<br class="">
</td>
<td valign="top" class="">6/24/16<br class="">
</td>
<td valign="top" align="right" class="">521 ms<br class="">
</td>
<td valign="top" align="right" class="">5.52<br class="">
</td>
<td valign="top" align="right" class="">2.66<br class="">
</td>
<td valign="top" align="right" class="">3741 ms</td>
</tr>
<tr class="">
<td valign="top" class="">Flex 0<br class="">
</td>
<td valign="top" class="">Disabled<br class="">
</td>
<td valign="top" class="">7/7/16<br class="">
</td>
<td valign="top" align="right" class="">401 ms<br class="">
</td>
<td valign="top" align="right" class="">5.52<br class="">
</td>
<td valign="top" align="right" class="">2.26<br class="">
</td>
<td valign="top" align="right" class="">1907 ms<br class="">
</td>
</tr>
<tr class="">
<td valign="top" class="">Flex 1<br class="">
</td>
<td valign="top" class="">Disabled<br class="">
</td>
<td valign="top" class="">7/7/16<br class="">
</td>
<td valign="top" align="right" class="">406 ms<br class="">
</td>
<td valign="top" align="right" class="">5.49<br class="">
</td>
<td valign="top" align="right" class="">2.25<br class="">
</td>
<td valign="top" align="right" class="">1906 ms<br class="">
</td>
</tr>
</tbody>
</table>
<br class="">
<br class="">
The "expensive query" is one that mirrors a query we no longer run in production. The cost of that query was what triggered my initial investigation. Specifically that this query would get significantly more expensive about 4 hours after a restart of the
service. That prompted my initial post: "Unexplained Jumps In Response Time 'Fixed' By Disabling Tiered Compilation". One of the first steps we took was to optimize that query. I still occasionally manually run the "unoptimized" version of the query to
monitor system performance.<br class="">
<br class="">
My conclusions so far are:<br class="">
<ul class="">
<li class="">In this environment, the system runs significantly better with tiered compilation disabled.
</li><li class="">With the increased Code Cache size, the compilation work tails off quickly and I don't think we are in any danger of hitting the decompile limit.<br class="">
</li></ul>
Just fyi. I realize this does not add anything to understanding the underlying issues.<br class="">
<br class="">
<div class="moz-cite-prefix">On 06/06/2016 08:52 AM, James Lent wrote:<br class="">
</div>
<blockquote cite="mid:575571F6.2040101@tivo.com" type="cite" class="">Thanks for your interest. Answer in-line.<br class="">
<br class="">
<br class="">
<div class="moz-cite-prefix">On 06/06/2016 04:00 AM, Tobias Hartmann wrote:<br class="">
</div>
<blockquote cite="mid:57552DA1.6040205@oracle.com" type="cite" class="">
<pre wrap="" class="">Hi James,
thanks for the report, please find some questions below.
On 02.06.2016 17:56, James Lent wrote:
</pre>
<blockquote type="cite" class="">
<pre wrap="" class="">We are running a Solr/Lucene based server on Java 8:
openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
Recently we tried to run with Tiered Compilation disabled:
-XX:-TieredCompilation
In production the system performed very well for about 70 hours and then
performance start to degrade very sharply. HotSpot logs indicated that
100s of methods had reached their recompile limit. A sample of the logs
is below:
made not compilable on levels 0 1 2 3 4 java.lang.String::equals (81
bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4
org.apache.lucene.util.OpenBitSetIterator::nextDoc (136 bytes)
decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4
org.apache.solr.search.SolrConstantScoreQuery$ConstantScorer::score (5
bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4
org.roaringbitmap.Util::fillArrayAND (92 bytes) decompile_count >
PerMethodRecompilationCutoff
The methods that hit this limit include Java, Lucene, Solr, and
RoaringBitMap methods.
I was able to reproduce this in a dev environment by just running very
light real world traffic. What I observed was that many methods were
being recompiled 3 or more times per hour.
I was able to "fix" this issue by one of two methods:
1) Increase max code cache size:
-XX:ReservedCodeCacheSize=240m
2) Disable code cache flushing:
-XX:-UseCodeCacheFlushing
After running for over 12 hours the maximum number of decompiles for a
method is 14 and that method hit 11 decompiles very quickly and then
leveled off (with the first fix it is still at 11). Only 7 methods have
8 or more decompiles and this has held steady for hours now).
I also tried the following based on a suggestion from Chris Newland,
but, it did not help:
-XX:-UseBimorphicInlining
My theory is that the Code Cache flushing algorithm coupled with the
Code Cache demands of my application are causing the HotSpot compiler to
"thrash". Keep cycling the methods through the Code Cache:
1) Compile method A
2) Compile method B
3) Kick out method A and compile method C
4) Kick out method B and compile method A
Observations:
1) When Tiered Compilation is disabled the Code Cache max drops from
240mb to 48mb.
</pre>
</blockquote>
<pre wrap="" class="">Did the problem also show up with Tiered Compilation?</pre>
</blockquote>
<br class="">
This problem did not. Of course with tiered compilation enabled the code cache max seems to default to 240mb. I did not try running tiered with a smaller max code cache size. I did not that tiered seems to use a lot more code cache. The same application
exceeds 70mb when it is enabled.<br class="">
<br class="">
However with tiered compilations enabled we were seeing a hard to understand drop in performance after about 4.5 hours in both production (2 second response times suddenly jump to 3 to 5 seconds). I was able to reproduce this issue in a dev environment with
surprisingly little trouble. I investigated this issue from many angles to try to understand what was going on (heap, linux file cache, background network traffic, processor load, EC2 variability). The one thing that did seem to work in dev was turning off
Tiered compilation (a desperation idea). I have not been able to identify any interesting compiler logs that correlate with that issue. When we tried it in production it worked very well for 70 hours and then this issue occurred which was actually much worse.
I was not a hero.<br class="">
<br class="">
I sent an email on this first issue to the hotspot-dev mailing list (before the "turn off tiered compilation fix" failed in production), but, was not expecting much help because I could not provide much data to work with. Just a description of the symptoms.<br class="">
<blockquote cite="mid:57552DA1.6040205@oracle.com" type="cite" class="">
<blockquote type="cite" class="">
<pre wrap="" class="">2) When not constrained (i.e. when one of my two "fixes" is applied) my
application under this traffic load levels out at just below 22mb in the
Code Cache.
3) When neither of my fixes is applied then the Code Cache usage peaks
about 18mb, but, is highly variable minute to minute.
</pre>
</blockquote>
<pre wrap="" class="">You could try to use a lower value of -XX:NmethodSweepActivity to avoid aggressive sweeping when the code cache is getting full. But I doubt that this helps because there is still enough space in the code cache.</pre>
</blockquote>
I will try to give it a try as time permits, but, I have torn down my dev environment for now. I was able to reproduce this issue right on my dev box, but doing so consumes most of its resources. I may try to set it up before I leave tonight.<br class="">
<blockquote cite="mid:57552DA1.6040205@oracle.com" type="cite" class="">
<pre wrap="" class=""></pre>
<blockquote type="cite" class="">
<pre wrap="" class="">We are planning to implement fix one above in production soon. Any
comments?
</pre>
</blockquote>
<pre wrap="" class="">I think increasing the code cache size does not hurt and prevents aggressive sweeping in this case because a larger code cache size leads to a larger per method hotness threshold for flushing.
Did you try to reproduce the problem with a JDK 9 early access build? The symptoms you are describing sound quite similar to an issue we encountered with JDK 9. This was fixed by code aging [1] but the fix was never backported to JDK 8. You can enable/disable code aging with -XX:UseCodeAging in 9. If the problem turns out to be fixed, we could provide a prototype backport of code aging for testing in your environment.</pre>
</blockquote>
Thanks for the info. I have not tried with Java 9. I will put that on the list to try.
<br class="">
<br class="">
<blockquote cite="mid:57552DA1.6040205@oracle.com" type="cite" class="">
<pre wrap="" class="">Best regards,
Tobias
[1] <a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/975d903e1de3">http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/975d903e1de3</a>
</pre>
<blockquote type="cite" class="">
<pre wrap="" class="">I can provide HotSpot logs. Chris has provided me the required options
to get the more detailed information.
________________________________
This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
</pre>
</blockquote>
</blockquote>
<br class="">
<div class="moz-signature"><small class=""><font class="">James Lent</font><br class="">
<font color="#1f39f6" class="">Senior Software Engineer</font><br class="">
<br class="">
<font color="#1f39f6" class=""><b class="">Digitalsmiths</b></font><br class="">
<font color="#1f39f6" class="">A TiVo Company</font><br class="">
<br class="">
<a moz-do-not-send="true" href="http://www.digitalsmiths.com/" target="_blank" style="color: purple;" class="">www.digitalsmiths.com</a><br class="">
<a moz-do-not-send="true" href="mailto:jlent@digitalsmiths.com" target="_blank" style="color: purple;" class="">jlent@digitalsmiths.com</a>
<font class=""> | office 919.460.4747</font> </small></div>
</blockquote>
<br class="">
<div class="moz-signature"><small class=""><font class="">James Lent</font><br class="">
<font color="#1f39f6" class="">Senior Software Engineer</font><br class="">
<br class="">
<font color="#1f39f6" class=""><b class="">Digitalsmiths</b></font><br class="">
<font color="#1f39f6" class="">A TiVo Company</font><br class="">
<br class="">
<a href="http://www.digitalsmiths.com/" target="_blank" style="color: purple;" class="">www.digitalsmiths.com</a><br class="">
<a href="mailto:jlent@digitalsmiths.com" target="_blank" style="color: purple;" class="">jlent@digitalsmiths.com</a>
<font class=""> | office 919.460.4747</font> </small></div>
<br class="">
<hr class="">
<font face="Arial" color="Gray" size="1" class=""><br class="">
This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient,
please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made
by a signed written agreement.<br class="">
</font>
</div>
</div></blockquote></div><br class=""></div></div></div></div></body></html>