Reducing Garbage Generated by URLClassLoader
mkanat at google.com
Mon Dec 5 06:31:25 UTC 2016
Yeah, I have implemented a fast-path byte-only ZipCoder in a customized JDK
and it makes a big difference for allocations in long classpaths. The basic
code to do just that isn't very complex. I could possibly dig that up and
upstream it if there's interest. My recollection is that my solution isn't
the cleanest, but it doesn't regress the "needs encoding" path.
It also is possible to optimize URLClassLoader itself to do a better job of
caching zip entries, which significantly reduces the String allocation load
if you're doing a lot of lookups on the classpath. I have also implemented
something like this, but it's hard to get right and my changes aren't in a
state where they could be easily upstreamed.
On Mon, Dec 5, 2016 at 12:58 AM, Xueming Shen <xueming.shen at oracle.com>
> On 12/4/16, 1:21 PM, Scott Palmer wrote:
>> Excuse me if this is the wrong list for this discussion. Please direct
>> me to the right place if this isn’t it.
>> When doing an analysis of garbage generation in our application we
>> discovered a significant number of redundant strings generated by the class
>> loader. In my case there are hundreds of jars on the classpath -
>> everything in the application is a plugin. I figured on average 10kB of
>> useless garbage chars were generated per findResource call for plugin
>> This is caused mostly by the ZipFile implementation. What is the purpose
>> of java.util.zip.ZipCoder’s byte getBytes(String s) method? It seems to
>> simply be a custom implementation of string.getBytes(CharSet cs) and as
>> such needs to first make a copy of the char to work on.
> The "entry name" stored in the zip/jar file is not encoded as a UTF16 char
> sequence but bytes in
> some "native" encodings, utf8 is one of these encodings the ZipFile
> supports. The default one for
> a jar file is utf8. So when you want to lookup a resource from the jar
> file with a name as a String
> object, we have to convert/encode this "name" from String into the
> corresponding byte in utf8
> and do a hash table lookup to find the resource. Here are some
> implementation details
> (1) why do we need a "custom" version in ZipFile. This is because
> String.getBytes(cs) replaces
> unmappable/malformed chars with "?" silently, ZipFile API needs to throw
> an corresponding
> exception in this scenario, so we have to have a "custom" version to do it.
> (2) for performance reason we don't want to convert all jar entry names in
> all open jar file into
> either String or char in advance, they are kept as byte in their
> original form and we don't even
> have a single byte copy for each entry name, all names are kept in their
> original "cen" table form
> in byte and we only have a "offset" to each entry's offset. We are
> talking about hundreds of
> jars and each jar has hundreds if not thousands of entries. Arguably we
> can do the other way
> around, always convert those entry names in each open jar file to String,
> and then we don't have
> to do the String->byte during lookup. It's a design decision. If there
> is enough evidence
> suggests otherwise, it can be changed/doable, given we now have all the
> implementation at
> Java level in jdk9.
> That said, given the optimization we have done for String in jdk9, it
> might be worth considering
> to have a fast path for those ascii-only entry names (I would assume
> 99.9%+ of the entry names
> are ascii-only in real world), then it should take a simple byte copy to
> convert/encode those
> entry names from String to byte.
> This combined with the need to operate on byte path names internally
>> in the ZipFile implementation means that URLClassLoader generates a lot of
>> unnecessary garbage in a findResource call - proportional to the number of
>> jars on the classpath.
>> Since JarFile forces the ZipFile to be open with UTF-8 always, if there
>> was some API exposed that took a byte for the resource name, all of that
>> extra string copying and encoding could be hoisted out of the loop in
>> sun.misc.URLClassPath. Would this be worth it creating an internal class
>> for something like a ‘ClasspathJarFile’ to and tweaking ZipFile so the
>> byte based method is protected instead of private?
>> I also noticed that sun.net.www.ParseUtil.encodePath(String, boolean)
>> usually had nothing useful to do but still made three copies of the string
>> passed in anyway (two char arrays to work on, and the String returned).
More information about the core-libs-dev