<i18n dev> Codereview request for 7096080: UTF8 update and new CESU-8 charset
xueming.shen at oracle.com
Thu Sep 29 15:27:46 PDT 2011
On 09/29/2011 02:16 PM, Ulf Zibis wrote:
> Please use spaces with ternary operators: Lines 155, 216
> For short you could use sr instead srcRemaining, consistent to sa, sp, sl.
> 420 // returns -1 if there is malformed byte(s) and the
> 420 // returns -1 if there is/are malformed byte(s) and the
> 466 sp -=3;
> There should be a space: sp -= 3;
Webrev has been updated accordingly.
> 280 if (Character.isSurrogate(c))
> 281 return malformedForLength(src, sp, dst,
> dp, 3);
> Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be
> interpreted again ?
Actually I don't know the answer. My reading of D93a/D93b suggests that
interpret it as a whole, given the bytes are actually in well-formed
byte pattern range
listed in Table 3.7, but "ill-formed" simply because they are surrogate
value not scale
value, so I would interpret the whole 3 bytes as a maximal subpart.
Given D93a/b is
"best practices for Using U+fffd", either way should be fine. We do have
on the list, so maybe they can share their opinion on what is the
behavior in this case, from Standard point view?
> Am 29.09.2011 05:27, schrieb Xueming Shen:
>> On 9/28/2011 3:44 PM, Ulf Zibis wrote:
>>> 5. IMHO charset CESU-8 should be hosted in extended-charsets,
>>> otherwise it should be added to java.nio.StandardCharsets
>> We have lots of charsets provided via the "standard charset provider"
>> (in rt.jar) but not listed in StandardCharsets.
> Yes, but the reasonable to add CESU-8 to StandardCharsets was the
> supposed demand to treat all unicode charsets equivalent.
> Otherwise there is no obstacle to host CESU-8 in extended-charsets.
> IMHO, CESU-8 addresses corner case compatibility issues, but not
> "standard" requirements.
To put CESU-8 into "standard charset provider" (it is only an
implementation details) does
not mean it is a "standard" requirement, it just means it is bundled
into rt.jar. The reason
I put it there is to make sure it is together with the UTF-8, with the
assumption is that you
might need it around when using the updated UTF-8, which no longer
handles those 3/6-byte
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the i18n-dev