Request for Review : CR#6924259: Remove String.count/String.offset
mike.duigou at oracle.com
Fri May 25 22:09:28 UTC 2012
On May 25 2012, at 06:57 , Rémi Forax wrote:
> Hi Mike, Hi Alan, Hi all,
> in my opinion, EMPTY_STRING_VALUE is a premature optimization,
> the idea is, I think, to avoid to create an empty char but if you want to do that,
> it's better to do that in StringBuilder.toString() to avoid to create the String at all.
I have removed most of the use of EMPTY_STRING_VALUE. It remains for the empty constructor only.
> I'm worried about one pathological case where a lot of non empty String
> will be created and then one empty will be created after the code will be JITed,
> in that case Hotspot will deoptimize all codes that have inlined the constructor call.
Reasonable and confirmed with Valdimir Kozlov.
> String(StringBuilder) and String(StringBuffer) can be simplified
> by using length() and getValue() (AbstractStringBuilder.getValue())
> to extract the info.
> this.value = Arrays.copyOf(builder.getValue(), builder.length());
> and avoid to create a temporary String.
Good optimization. I had to add a sync around the buffer version.
> String(char value, boolean share) should be
> String(char value, boolean share) and is there a perf reason to
> not use a static method like createSharedString() instead.
none. I will leave for a later change.
> Removing String(int offset, int count, char value) will create trouble,
> I've seen several libraries that use it by reflection to avoid to create
> a copy of the array. I think this method should not disappear
> but use Arrays.copyOfRange if the offset is not 0.
> This method should be marked deprecated, and removed in Java 9.
Restored but deprecated.
> Recently, _getChars(char, int) was replaced by
> getChars(char,int). It was a stupid change because now
> one can think that getChar(char, int) and getChar(int,int,char,int)
> do the same things. but getChat(char,int) don't do any bounds check.
> So concat() and toCharArray() doesn't do any bound check !
I don't see where either of them need to do any checking.
> toCharArray() should use Arrays.copyOf()
> Overloads of encode in StringCoding are in my opinion not necessary
> because each method is called once.
Reasonable. I opted not to add the 3rd variant ( count) because it wasn't used. I've reverted to using the 3 param version.
> So this doesn't really share code
> but just add one level to the depth of the call stack.
> On 05/25/2012 02:08 PM, Alan Bateman wrote:
>> On 24/05/2012 21:45, Mike Duigou wrote:
>>> Hello all;
>>> For a long time preparations and planing have been underway to remove the offset and count fields from java.lang.String. These two fields enable multiple String instances to share the same backing character buffer. Shared character buffers were an important optimization for old benchmarks but with current real world code and benchmarks it's actually better to not share backing buffers. Shared char array backing buffers only "win" with very heavy use of String.substring. The negatively impacted situations can include parsers and compilers however current testing shows that overall this change is beneficial.
>>> The current plan is to commit this change to jdk8 before build 40 followed by jdk7u6. Removing offset/count is a complicated process because HotSpot has special case code for String that depends upon the two fields. Vladimir Kozlov has committed the required HotSpot changes already (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/8f972594effc).
>>> A patch of the changes to java.lang.String for JDK 8 are at<http://cr.openjdk.java.net/~mduigou/6924259/0/webrev/>. The changes for JDK 7 have only superficial differences (line offsets in the patch).
>>> Comments are welcome.
>> I went through the latest webrev and don't see anything obviously wrong. EMPTY_STRING_VALUE is new, at least to me, as I don't think it was there when removing these fields was prototyped a few years ago.
>> A minor point in the String(char) constructor is that you could move the "int len = .." to the top of the method to be consistent with other places where a local holds the value.length.
>> The "use name version for caching" comment in StringCoding.java might be confusing to readers, maybe "use charset name as this is faster due to caching".
>> A non-material comment is that there are a couple of style changes that I found annoying. Everyone is an expert on such matters, I'm not, but the one that bugged me a bit was adding the space after the cast, eg:
>> (toffset > (long) value.length - len)
>> I found myself needing to re-read it to see if the cast applied to value.length or value.length-len. It's a minor point but you know what I mean.
More information about the core-libs-dev