shared "value" in java.lang.StringBuilder
john.r.rose at oracle.com
Tue Nov 12 04:14:01 UTC 2013
On Nov 11, 2013, at 2:14 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> I believe a long time ago it used to work this way; there was a private "handoff" constructor in String that StringBuffer would use. Probably got changed when we added StringBuilder?
As you might expect, we've been exploring this territory for a long time.
Since 1.4.2 circa 2003, StringBuffer had a boolean field called (wait for it...) "shared" with a package-private constructor in String. That was the aboriginal form of the String class.
IIRC, we got rid of it because (a) there was no foolproof way to right-size the shared buffer array, and (b) it is better on modern systems to co-locate the string header and body.
Some sort of "chop" operator for right-sizing (which would be unsafe and/or hard to implement) could also address (a), but in the long run memory locality wins over almost everything.
I think the next interesting version of String will mix header and body together in a hybrid layout, with private char (or byte) elements and a few instance variables at the beginning of the layout. For good measure, it should deprecate or disallow object identity operations, so that equivalent strings can be shared by the JVM, even after creation. Those changes will require difficult JVM cuts, which is why we haven't done it yet, except in some research projects.
> On Nov 6, 2013, at 10:54 PM, Robert Stupp wrote:
>> I was wondering why the mostly allocated class in nearly all applications is char. A deeper inspection showed that a lot of these char allocations are "caused" by the code from java.lang.StringBuilder.toString(), which created a copy of its internal char array. Most StringBuilder instances are no longer used after the call to StringBuilder.toString(). Many of these instances have been created by javac caused by "plain" string concatenation in source code.
>> Wouldn't it worth to try whether passing the (Abstract)StringBuilder's value+count values to String results in less temporary object creations and therefore reduce pressure on new generation (and reduce GC effort)? My idea is to add a field 'shared' to AbstractStringBuilder, which is set when StringBuilder.toString() is called. If the StringBuilder is really modified after calling toString(), the StringBuilder creates a new copy of the value array and resets the 'shared' field. Since the value array might be longer than the current count, String class would need a "re-invention" of the count field.
>> Another think I noticed is that the StringBuilder instances transiently created by javac seem to use the default constructor. But a huge amount of string concatenations in Java code result in (much) longer Strings, which means that each append creates a new, larger copy of the value array in AbstractStringBuilder. Is it possible to add some "guessing" for the initial capacity - this would eliminate a lot of temporary objects and reduce GC effort. Is it worth to check this out? Are the two places in com.sun.tools.javac.jvm.Gen#visitAssignop/visitBinary the only places where these StringBuilder instances are created?
More information about the core-libs-dev