per at bothner.com
Mon Nov 14 01:28:36 UTC 2016
On 11/13/2016 04:21 AM, Zenaan Harkness wrote:
> Although grapheme indexing is probably more generally useful for
> multi-lingual UI.
Quite possibly. However, a code-point can be represented as an unboxed
int. A grapheme requires memory allocation. You cannot store it in a
register or even a fixed number of registers, unless you use an indirect
substring representation (base string, start offset, end offset), which
has its own problems.
You can always build a grapheme-based API on top of a codepoint API,
but not vice versa. You can of course do the same on top of a UTF16
code-unit API, but it's more error-prone and unnatural: At least
code-points have some natural semantic meaning; code-units do not.
> "CharSequence" is deceptive. Should be called CodePointSequence or
> something else again... "char" is -so- overloaded in Java in particular.
java.lang.CharSequence is *not* a sequence of code-points.
It's a sequence of UTF-16 code-units, just like java.lang.String.
per at bothner.com http://per.bothner.com/
More information about the discuss