JEP 254: Compact Strings - length limits

Tim Ellison t.p.ellison at
Tue Sep 6 21:18:48 UTC 2016

On 06/09/16 19:04, Xueming Shen wrote:
> On 9/6/16, 10:09 AM, Tim Ellison wrote:
>> Has it been noted that while JEP 254 reduces the space occupied by one
>> byte per character strings, moving from a char[] to byte[]
>> representation universally means that the maximum length of a UTF-16
>> (two bytes per char) string is now halved?

Hey Sherman,

> Yes, it's a known "limit" given the nature of the approach. It is
> not considered to be an "incompatible change", because the max length
> the String class and the corresponding buffer/builder classes can
> support is really an implementation details, not a spec requirement.

Don't confuse spec compliance with compatibility.  Of course, the JEP
should not break the formal specified behavior of String etc, but the
goal was to ensure that the implementation be compatible with prior
behavior. As you know, there are many places where compatible behavior
beyond the spec is important to maintain.

> The conclusion from the discussion back then was this is something we
> can trade off for the benefits we gain from the approach. 

Out of curiosity, where was that?  I did search for previous discussion
of this topic but didn't see it -- it may be just my poor search foo.

> Do we have a real use case that impacted by this change?

People stash all sorts of things in (immutable) Strings. Reducing the
limits in JDK9 seems like a regression.  Was there any consideration to
using the older Java 8 StringCoding APIs for UTF-16 strings (already
highly perf tuned) and adding additional methods for compact strings
rather than rewriting everything as byte[]'s?


>> Since the goal is "preserving full compatibility", this has been missed
>> by failing to allow for UTF-16 strings of length greater than
>> Integer.MAX_VALUE / 2.
>> Regards,
>> Tim

More information about the core-libs-dev mailing list