String concatenation tweaks
forax at univ-mlv.fr
Thu Mar 12 07:03:23 UTC 2015
On 03/11/2015 10:01 PM, Louis Wasserman wrote:
> OpenJDK's implementation of String concatenation compiles
> "foo" + bar + "quux" + baz
> into essentially the same bytecode as
> new StringBuilder()
> We've been successfully experimenting at Google with presizing the
> StringBuilder to avoid the need for rebuffering, with extensive
> consultation with martinrb@ and cushon at . I have not yet ported the
> patch to head, but wanted to bounce the idea off this list before
> doing so. Some key points of interest:
> * It suffices to provide an upper bound on the size, if that's not
> too much bigger than the real length. For example, for
> primitives, we use the bound of the maximum length of the toString
> of that primitive type: for example, a boolean is treated as
> having length bounded at 5.
> * Nonconstant Objects, including CharSequences, have their toString
> stored in a local. For example, "foo" + myStringBuilder would be
> compiled to approximately
> String myStringBuilderToString = myStringBuilder.toString();
> return new StringBuilder(3 + myStringBuilderToString.length())
> This is necessary to deal with the possibility of mutation
here you have two optimizations, one is to call toString() and store the
result in local variable for each objects to append, the second one is
to try to pre-calculate the size of the resulting String.
Do you have done some measurement of former without being combined with
the later ?
I ask that because I think that the code of OptimizeStringConcat only
works if Hotspot is able to determine that all the objects to append are
> * (Nonconstant primitives are also stored in a local to preserve
> evaluation order and avoid mutation, but not converted to
> Strings. There might be some room for optimization here for
> primitive values coming from final fields or locals.)
> * Some mostly-redundant null checking is necessary to deal with the
> evil edge case where toString() returns null.
valueOf(valueOf(x)) is quite ugly but i don't see how to do better :(
> * Taking all the above into account, our benchmarks showed 15% CPU
> improvements and 25% fewer bytes allocated relative to the status
> quo, independent of -XX:+OptimizeStringConcat.
> * While we were at it, in the case of two arguments that are
> statically known to be Strings, our benchmarks show String.concat
> to be firmly more efficient than the StringBuilder, even in the
> presence of flags like -XX:+OptimizeStringConcat. This is
> arguably a separate optimization, but nonetheless effective; our
> benchmarks at the time suggested 40% CPU improvements and 60%
> fewer bytes allocated relative to the status quo.
> So for example, "foo" + myInt + myString + "bar" + myObj would be
> compiled to the equivalent of
> int myIntTmp = myInt;
> String myStringTmp = String.valueOf(myString); // defend against null
> String myObjTmp = String.valueOf(String.valueOf(myObj)); // defend
> against evil toString implementations returning null
> return new StringBuilder(
> 17 // length of "foo" (3) + max length of myInt (11) + length of
> "bar" (3)
> + myStringTmp.length()
> + myObjTmp.length())
> As far as language constraints go, the JLS is (apparently
> deliberately) vague about how string concatenation is implemented.
> "An implementation may choose to perform conversion and concatenation
> in one step to avoid creating and then discarding an intermediate
> String object. To increase the performance of repeated string
> concatenation, a Java compiler may use the StringBuffer class or a
> similar technique to reduce the number of intermediate String objects
> that are created by evaluation of an expression." We see no reason
> this approach would not qualify as a "similar technique."
> If these suggestions (and performance numbers) are of interest, I can
> port our patch for upstream use.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the compiler-dev