<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi again,<br>
<br>
Am 30.09.2011 00:27, schrieb Xueming Shen:
<blockquote cite="mid:4E84F0E2.6010503@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
On 09/29/2011 02:16 PM, Ulf Zibis wrote:
<blockquote cite="mid:4E84E025.3010105@gmx.de" type="cite"> <br>
280 if (Character.isSurrogate(c))<br>
281 return malformedForLength(src, sp,
dst, dp, 3);<br>
Shouldn't we return cr.length() = 1to allow remaining 2 bytes to
be interpreted again ?<br>
<br>
</blockquote>
</blockquote>
Forget it! If c is a surrogate, b2 is in range A0..BF and b3 is in
range 80..BF. Both can not be potentially well-formed as a first
byte.<br>
<br>
<br>
<blockquote cite="mid:4E84F0E2.6010503@oracle.com" type="cite">
Actually I don't know the answer. My reading of D93a/D93b suggests
that we might<br>
interpret it as a whole, given the bytes are actually in
well-formed byte pattern range<br>
listed in Table 3.7, but "ill-formed" simply because they are
surrogate value not scale<br>
value, so I would interpret the whole 3 bytes as a maximal
subpart. Given D93a/b is<br>
"best practices for Using U+fffd", either way should be fine. We
do have Unicode expert<br>
on the list, so maybe they can share their opinion on what is the
"desired"/recommended<br>
behavior in this case, from Standard point view?<br>
</blockquote>
<br>
At line 102 you could insert:<br>
// [E0] [A0..BF]<br>
// [E1..EF] [80..BF]<br>
<br>
-Ulf<br>
<br>
</body>
</html>