kirk at kodewerk.com
Thu May 9 08:34:09 UTC 2013
I am very very very much interested in testing any improvements you make to adaptive sizing. It would make parallel collections a viable option in many cases where they currently aren't.
So, interesting comments on overheads due to JIT interactions. I think one of the more disruptive activities is safe-pointing. It really creates a drag on the amount of CPU you can reach even at rates that you might not consider harmful. In one bench where we were focusing on tx retirement rates reducing GC frequency from once every 200ms to once every second allowed me to pin a CPU that was otherwise running at 70% with a 5-8% system overhead. So, we got three sources of CPU that allowed me to boost retirement rates, one from the extra CPU due to less time in GC and one from being able to soak up not on the 20% idle CPU but the system dropped to about 1% and so there was an extra 7% soaked up from reduced system time. All I did was tune for GC frequency. I should add that overhead in the first run was less than the 5% threshold that I'd normally aim for (need to dig out results to get exact values) but putting that to under 1% resulted in a disproportionate 50% boost in retirement rates. Without digging deeper it seems as if safe-pointing was the culprit.
On 2013-05-09, at 9:22 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
> Hi Kirk,
> Thanks for sharing your experiences.
> I sympathize with your experiences with adaptive sizing when trying to
> characterize fundamental
> performance differences. In order to better understand the fundamentals and to
> maintain some modicum of control over the experiment, I disabled
> adaptive sizing of the spaces,
> essentially locking the heap size and generation sizes for Parallel
> Old, CMS and G1 to the
> exact same value(s). I have not yet run so-called "speed-of-light"
> experiments with GC disabled to
> see if the G1 overhead that I am seeing vis-a-vis Parallel or CMS is
> indeed JIT related, but I have
> a strong suspicion that it is. (Disclaimer: I have used a single
> application with a variety of workloads
> on that application in my experiments; so the JIT optimization
> inhibiting that I am referring to might
> well be specific to the application I was using during my experiments.
> I should probably repeat with
> a different application, as well as perhaps use a profiler to look at
> which JIT'd (or interpreted) method(s)
> might be behaving differently in the two cases.
> As regards ParallelGC's adaptive sizing, I have recently found that
> the filter that maintains the estimate
> of the promotion volume (which sereves among other things, as input to
> the decision whether a
> scavenge might fail) produces estimates that are far too perssimistic
> and, in a heap-constrained situation,
> can end up locking the JVM into what appears to be a self-sustaining
> suboptimal state from which
> it will not easily be dislodged. A smallish tweak to the filter will
> produce much better behaviour.
> I will submit patch when I have had a chance to test its performance.
> -- ramki
> On Wed, May 8, 2013 at 10:46 PM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
>> Hi Ramki,
>> I've noticed quite a bit of problems with G1 performance but I've not been able to quantify them in a way that make me able report what is actually happening. In one instance I calculated a GC frequency that required a certain size of Eden to achieve in order to see the required pause time yet not place so much pressure on context switching. The application had pretty close to a constant rate of memory churn but it would occasionally and necessarily experience spikes at which time we were looking for Eden to be able to expand to cope. So, the strategy was start with 100MB Eden and then let it adapt to 20MB. It always stopped at 40M and consequently we were never able to hit pause time goals. From this app I wrote a bench that I've been periodically running to see if I can understand why it's not adapting. So far I've simply not been able to get enough runs in to see what is causing the overhead.
>> As for the application, we went with CMS because I was able to manipulate the configuration to have it meet the pause time goal but the fear is CMF and/or OOME due to the occasional spikes in load. We tried the parallel collector but the problem with that is the adaptive sizing policy does not take into account premature promotion rates and thus it always leaves survivors undersized leading to too frequent full GCs. Last year I mentioned that I would be interested in looking at an adaptive size policy rewrite that corrected this problem and it was indicated that it had already been done. Unfortunately it hasn't shown up AFAICT. So, I'll reiterate the offer to fix adaptive sizing in light that it is needed but quite often needs to be turned off due to the premature promotion/too frequent full GC problem.
>> So, back to the app the conclusion I had to come to is that there isn't a suitable collector for this particular application in OpenJDK today. CMS offered the fewest problems but each is a very worrisome given the environment in which the application has been deployed. I'm happy to report in things from my bench when I finally get them sorted.
>> On 2013-05-08, at 10:31 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>>> I have been playing with G1 a bit, and have noticed a nearly 10%
>>> overhead with G1, which is independent of the measured GC overheads.
>>> It's possible that this is because of the G1 write barriers inhibiting
>>> certain JIT optimizations. I have also seen, although this needs to be
>>> established more thoroughly, that G1's performance degrades the longer
>>> it runs, in the sense that minor GC pause times become progressively
>>> Has this kind of performance behaviour been observed by others on this
>>> list? Or internally at Oracle in performance testing of G1? Or by
>>> other power users of G1 out there?
>>> Basically, all of the experiments I have done seem to indicate that
>>> CMS performs better than G1, but unfortunately the potential
>>> fragmentation problem with CMS (followed by the promotion failure
>>> handling and the single-threaded compaction to follow) make it
>>> unsuitable in certain situations.
>>> -- ramki
More information about the hotspot-gc-dev