<i18n dev> Codereview request for 7096080: UTF8 update and new CESU-8 charset

Ulf Zibis Ulf.Zibis at gmx.de
Sun Oct 2 02:52:35 PDT 2011

Am 02.10.2011 08:29, schrieb Xueming Shen:
> http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
> Go to 3.9 Unicode Encoding Forms. Or simply search D93
> On 10/1/2011 2:21 PM, Ulf Zibis wrote:
>> Am 30.09.2011 22:46, schrieb Xueming Shen:
>>> On 09/30/2011 07:09 AM, Ulf Zibis wrote:
>>>>> (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> CoderResult.malformedForLength(1)
>>>>> It appears the Unicode Standard now explicitly recommends to return the malformed length 2,
>>>>> what UTF-8 is doing now, for this scenario
>>>> My idea behind was, that in case of malformed length 1 a consecutive call to the decode loop 
>>>> would again return another malformed length 1, to ensure 2 replacement chars in the output 
>>>> string. (Not sure, if that is expected in this corner case.)
>>> Unicode Standard's "best practices" D93a/b recommends to return 2 in this case.
OK, I got it:
E1 80 42 --> malformed length 2 --> 1 replacement --> FFFD 0042

Because for later understanding by others it could be difficult to find the right documents, it 
would be *very nice* to add this link to the souce code of UTF_8.java, by javadoc, or by simple doc.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/i18n-dev/attachments/20111002/b3dff8f7/attachment-0001.html 

More information about the i18n-dev mailing list