String.subSequence and CR#6924259: Remove offset and count fields from java.lang.String
peter.levart at gmail.com
Sat Jun 23 07:17:05 UTC 2012
Yes, all that you say below is true. CharSequence is an interface that does
not define the contract of identity when implementations/subtypes of
CharSequence do - each in it's own way. Much like java.util.Collection and
List vs. Set. It's always dangerous for methods that return such interfaces
when the implementation class of the returned instance changes so that it
defines a different "hidden" contract.
It is unfortunately also true that now there is no "official" way of
comparing/equating/hashing a substring without copying the characters. It
feels like standard Java SE API is lacking a part of functionality.
Maybe this space could be filled with the addition of a new public immutable
CharSequence subtype - like Rope: http://ahmadsoft.org/ropes/doc/index.html
On Friday, June 22, 2012 03:15:40 PM Mike Duigou wrote:
> I've made a test implementation of subSequence() utilizing an inner class
> with offset and count fields to try to understand all the parts that would
> be impacted. My observations thus far:
> - The specification of the subSequence() method is currently too specific.
> It says that the result is a subString(). This would no longer be true.
> Hopefully nobody assumed that this meant they could cast the result to
> String. I know, why would you if you can just call subString() instead?
> I've learned to assume that somebody somewhere does always does the most
> unexpected thing. - The CharSequences returned by subSequence would follow
> only the general CharSequence rules for equals()/hashCode(). Any current
> usages of the result of subSequence for equals() or hashing, even though
> it's not advised, would break. We could add equals() and hashCode()
> implementations to the CharSequence returned but they would probably be
> expensive. - In general I wonder if parsers will be satisfied with a
> CharSequence that only implements identity equals(). - I also worry about
> applications that currently do use subSequence currently and which will
> fail when the result is not a String instance as String.equals() will
> return false for all CharSequences that aren't Strings. ie. CharSequence
> token = line.subSequence(line, start, end); if (keyword.equals(token)) ...
> This would now fail.
> At this point I wonder if this is a feature worth pursuing.
> On Jun 3 2012, at 13:44 , Peter Levart wrote:
> > On Thursday, May 31, 2012 03:22:35 AM mike.duigou at oracle.com wrote:
> >> Changeset: 2c773daa825d
> >> Author: mduigou
> >> Date: 2012-05-17 10:06 -0700
> >> URL: http://hg.openjdk.java.net/jdk8/tl/jdk/rev/2c773daa825d
> >> 6924259: Remove offset and count fields from java.lang.String
> >> Summary: Removes the use of shared character array buffers by String
> >> along
> >> with the two fields needed to support the use of shared buffers.
> > Wow, that's quite a change.
> > So .substring() is not O(1) any more?
> > Doesn't this have impact on the performance of parsers and such that rely
> > on the performance caracteristics of the .substring() ?
> > Have you considered then implementing .subSequence() not in terms of just
> > delegating to .substring() but returning a special CharSequence view over
> > the chars of the sub-sequence?
More information about the core-libs-dev