Maybe codec bug in MS1252, i.e., encoding Cp1252
xueming.shen at oracle.com
Fri Sep 2 12:50:27 PDT 2011
On 09/02/2011 02:14 AM, Eric Liang wrote:
> On 09/02/2011 04:04 AM, Xueming Shen wrote:
>> These 5 code points are "undefined" character in Cp1252. The first one
>> should be 0x81 not 0x83, since 0x83<->u_0192 is defined and works
>> correctly in Cp1252 charset). The mapping table you referred to is
>> "bestfit" type mapping table, in which it tries to provide the mapping
>> between the local encoding and the Unicode character set for those
>> characters not even exist in the local encoding. Personally I don't think
>> it's a good idea in most use scenario. All other official (from
>> or un-official mapping tables clearly mark these code points "undefined"
>> or "unused", for example
>> btw, code below is incorrect, or it does not work the way you might
>> String name1 = new String( new String("兆源").getBytes("UTF-8"),
>> String name2 = new String( name1.getBytes("Cp1252"), "UTF-8");
>> new String("兆源").getBytes("UTF-8") encodes your 2 Chinese character
>> UTF-16 to UTF-8 bytes. It does not makes sense to then decode these UTF-8
>> bytes back to UTF-16 (which the String object uses) by using Cp1252
>> same for the second attempt.
>> What did you try to achieve? decode/encode between UTF-8 bytes and CP1252
>> bytes? It's not going to be a round-trip conversion for those
>> non-ASCII characters.
> Thanks Sherman for your explanation.
> The problem occured when I was using JDBC with MySQL. The former
> application has stored the utf8 data to a default configured database
> ( with encoding is latin1 ), and get the data and decode in PHP is OK.
> But I failed in java when reading the data. From the document(
> ), latin1 in MySQL corresponds with Cp1252 in JAVA, so I found the
> cause, and I deem the guy here also encountered this problem (
> http://forums.mysql.com/read.php?39,228068,228068#msg-228068 ).
> As since the data in latin1(in java) can be converted to utf8 freely
> and vice versa. From the wikipedia Cp1252 is treated as a superset a
> ISO_8859-1, so I guess the same expectation on Cp1252 as latin1 is
> something natural, though it does not work now.
> However, YMMV, would you mind give some suggestions on this? Thanks
> in advance.
Windows-1252 (cp1252) is a supperset of ISO 8859-1. ISO 8859-1 is
normally referred as
the latin-1. What we have in Java charset repository is ISO-8859-1. The
ISO 8859-1 and ISO-8859-1 (with dash and without dash) is the C0 and C1
area. ISO-8859-1 has the C0 and C1 defined, ISO 8859-1 does not.
So in your above workaround, you'd better use ISO-8859-1, stead of cp1252.
I know little about JDBC + MySQL, so probably not the one to give
suggestion on this topic.
By simply reading the description of the problem you are facing with, I
guess you'd better
to set your client side encoding/charset correctly to utf-8 or gbk to
receive result in Chinese
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the jdk6-dev