RFR: 8013395 StringBuffer.toString performance regression impacting embedded benchmarks
david.holmes at oracle.com
Mon May 13 08:38:57 UTC 2013
On 13/05/2013 6:07 PM, Remi Forax wrote:
> Hi David,
> On 05/13/2013 09:12 AM, David Holmes wrote:
>> Thanks for all the feedback and discussion. I've rewritten the patch
>> to use Peter's suggestion of caching the char instead of the actual
>> I too am concerned that any form of caching that avoids the array-copy
>> (which is the crux of the issue) is going to incur a 2x space penalty.
>> I can imagine old but well-behaved code that uses StringBuffer rather
>> than StringBuilder where very long buffers are created and toString()
>> only called once - and they may get immediate or subsequent OOME due
>> to the extra char; or excessive GC activity might be induced if a
>> lot of StringBuffers are used.
>> So I tried to implement a simple copy-on-write-if-shared (COWIS) scheme:
>> but I must have done something silly as the resulting JDK won't run -
>> for some reason it causes classes and resources to not be found. :(
> I don't know if it's the only error but take a look to the constructor
> String(char, boolean), you will understand what's wrong.
No, that's the intended constructor - we want to share the char
bewtween the StringBuffer and String until the StringBuffer is modified.
> The other possible source of errors is that the VM has some string
> optimizations that
> recognize patterns like buffer.append(s).append(s2).toString().
Hmmmm, if there are intriniscs for StringBuffer then there may be a
problem with getting the sharing right. But I still can't see anything
to cause the bizarre behaviour I'm seeing.
I do note that my implementation is a little naive/simplistic as in some
cases we will potentially copy the array twice eg once for COWIS then
again to expand it for an append. :(
>> On 11/05/2013 3:36 AM, Mike Duigou wrote:
>>> The implementation looks OK to me. I like Peter's suggestion of
>>> caching the char rather than string.
>>> I do wish that the cache could be in a soft reference but understand
>>> that there are tradeoffs to doing so. I am worried about leaks with
>>> this implementation.
>>> Another possibility, to go totally nuts, is to consider implementing
>>> a form of copy-on-write-following-toString. This would be the exact
>>> opposite of "minimal set of changes to address this specific problem"
>>> and probably not wise to attempt for Java 8.
>>> Just as an FYI, this issue has in-flight conflicts with Martin's
>>> ongoing CharSequence.getChars patch.
>>> On May 9 2013, at 23:03 , David Holmes wrote:
>>>> Short version:
>>>> Cache the value returned by toString and use it to copy-construct a
>>>> new String on subsequent calls to toString(). Clear the cache on any
>>>> mutating operation.
>>>> webrev: http://cr.openjdk.java.net/~dholmes/8013395/webrev.v2/
>>>> Testing: microbenchmark for toString performance; new regression
>>>> test for correctness; JPRT testset core as a sanity check
>>>> Still TBD - full SE benchmark (?)
>>>> Long version:
>>>> One of the goals for JDK8 is to provide a path from Java ME CDC to
>>>> Java SE (or SE Embedded). In the embedded space some pretty old
>>>> benchmarks still get used for doing comparisons between JRE's. One
>>>> of which makes heavy use of StringBuffer.toString, without modifying
>>>> the StringBuffer in between.
>>>> Up to Java 1.4.2 a StringBuffer and a String could share the
>>>> underlying char. This meant that toString simply needed to create
>>>> a new String that referenced the StringBuffer's char with no
>>>> copying of the array needed. In Java 5 the String/StringBuffer
>>>> implementations were completely revised: StringBuilder was
>>>> introduced for non-synchronized use, and a new AbstractStringBuilder
>>>> base class added for it and StringBuffer. In that implementation
>>>> toString now has to copy the StringBuffer's char. This resulted in
>>>> a significant performance regression for toString() and a bug -
>>>> 6219959 - was opened. There is quite an elaborate evaluation in that
>>>> bug report but bottom line was that "real code doesn't depend on
>>>> this - won't fix".
>>>> At some stage ME also updated to the new Java 5 code and they also
>>>> noticed the problem. As a result CDC6 included a variation of the
>>>> caching strategy that is proposed here.
>>>> Going forward because we want people to be able to compare ME and SE
>>>> with their familiar benchmarks, we would like to address this corner
>>>> case and fix it using the caching strategy outlined. As a data point
>>>> an 8K StringBuffer that takes ~1ms to be converted to a String
>>>> initially, can process subsequent toString() calls in a few
>>>> microseconds. So that performance issue is addressed.
>>>> However we've added a write to a field in all the mutating methods
>>>> which obviously adds some additional computational effort - though I
>>>> have no doubt it is lost in the noise for all but the smallest of
>>>> mutating methods. Even so this should be run against regular SE
>>>> benchmarks to ensure there are no performance regressions there - so
>>>> if anyone has a suggestion as to the best benchmark to run to
>>>> exercise StringBuffer (if it exists), please let me know.
>>>> Thanks for reading this far :)
More information about the core-libs-dev