question regarding the java.lang.String design
Xiaobin.Lu at Sun.COM
Fri Jan 30 22:20:06 PST 2009
I was ignoring the fact that substring could use the offset & count for
sharing purpose. I am thinking whether we should have a flag like
"isCharArrayShared" which will be set to true only for those string
returned from substring call. That way, for many other methods in
String, we could ignore loading offset & count fields which are mostly
set to 0 and val.length anyways (val is the character array).
David Holmes - Sun Microsystems wrote:
> Hi Xiaobin,
> As you've probably gleaned by now the count and offset fields are to
> allow sharing of the underlying char - which is a safe thing to do
> exactly because a string is immutable. I've often thought this
> particular optimization was under-utilized.
> As others have said optimization of strings has been a recurring theme
> for many years now - there was even a paper on it at last year's ACM
> OOPSLA conference. IBM Research's Tokyo labs do a lot in this area -
> see for example "RT0750 A Quantitative Analysis of Space Waste from
> Java Strings and its Elimination at GC Time".
> I've occasionally thought that with all the duplicate strings that
> readily occur in Java it might be an option to have a few large tables
> of "text" containing all the characters, and then to define a String
> as one or more pairs of indices into these tables. But that's as far
> as I've thought about it :)
> David Holmes
> Xiaobin Lu said the following on 01/31/09 04:42:
>> Resend the email to hotspot-dev at openjdk.java.net.
>> Xiaobin Lu wrote:
>>> Hi folks,
>>> While I am looking at the java.lang.String implementation, I noticed
>>> that it has "offset" and "count" field in java.lang.String. For the
>>> offset field, I only found two places which set that field, but I
>>> believe they can be got rid of too. The two places are
>>> String(StringBuffer buffer) & String(StringBuilder builder).
>>> My question is that if String is immutable, why do we need to carry
>>> these two fields? String could be more compacted without these two
>>> fields. The equals to method can be more efficiently implemented as
>>> just calling java.util.Array.equals(v1, v2) which is intrinsified on
>>> x86 at least.
>>> Another crazy thought is that we can compact the character array to
>>> a byte array if we don't have any characters other than ASCII (which
>>> we might use a boolean flag to indicate that).
>>> I'd appreciate your insight on this.
More information about the hotspot-dev