JEP 248: Make G1 the Default Garbage Collector
kirk at kodewerk.com
Fri Jul 31 08:22:53 UTC 2015
On Jul 30, 2015, at 8:56 PM, mark.reinhold at oracle.com wrote:
> 2015/7/30 11:04 -0700, charlie.hunt at oracle.com:
>> On javac, my observation is that it tends to have a very high object
>> allocation rate, and the object lifetimes are somewhat short. And, I
>> suppose if we think about what javac does, I suppose we can come to
>> terms as to why that’s the case. I imagine if one put forth a
>> concerted effort, javac could be improved (probably true for many apps
> If you mean to imply that high allocation rates and short lifetimes are,
> necessarily, signs of a poorly-written application then I'll have to
+1 with a caveat, memory efficiency does count so high allocation rates is a signal that you need to look at memory efficiency.
> Sure, these things can be done to extremes, but I don't think
> javac does that and I'd hate to see it re-engineered to "improve" these
> particular metrics. It might be better just to change javac's launcher
> to specify the Parallel GC, since this is most definitely an application
> for which pause times are irrelevant.
I have spent some time tuning the Scala compiler with Martin and Bill. Though I’ve not seriously looked at javac, I suspect that some of the observations made during the Scala may transfer. That said, here are some of my take-aways.
By the nature of the workloads, all parsers tend to be single threaded and have high allocation rates. This was certainly the case with Scala prior to my working with Martin. His test case was a compile that took about 10 minutes to complete. Since that meeting, the Scala compiler has undergone some parallelization but I’m not sure the addressed the the elephant in the room, symbol table (HashMap) resizing. In my estimation, calming that activity would have at least a 2-3 minute reduction in compile times. Going parallel would only add to the memory pressure and thus I wouldn’t expect the gains to so big. And of course, this picture is muddied because if you quiesce symbol table resizing, with this type of workload another object (most likely String) will fill the void.
The bigger gain was in how we managed GC. My thinking was, we have 8 cores of which the compiler is using 1 and that leaves 7 to manage memory. General thinking was, if my workload can’t use all the cores, make sure the other stuff that needs to be done runs on the idle cores so it stays out of the way as much as possible. To do this I switched to use the CMS collector so as much of the work as is possible is done along side the parser. By configuring CMS to run fairly frequently, I managed to knock close to 4 minutes off the compile time. This is not a one off in that I’ve managed to get similar wins using CMS in single threaded or low thread count applications running on machines with high core counts. I’ve had less luck with the G1 but it’s getting better. Heinz took my observation and turned it into an exercise in his concurrency course.
With the depreciation of iCMS, I’m now seeing GC logs coming from low latency applications that have very odd CMS configurations. What appears to be happening is that people are trying to configure CMS to mimic iCMS. It mostly works! My take on this is that people are configuring the collector to work all the time so that when it finally does need to pause, it doesn’t need to pause for very long. iCMS did this.
More information about the hotspot-dev