Codereview needed for HKSCS2008 support in JDK7

Xueming Shen Xueming.Shen at Sun.COM
Fri Feb 26 00:10:25 UTC 2010

Alan, Martin, Masayoshi and anyone interested, please help comment and 

6911753: NSN wants to add Big5 HKSCS-2004 support
6902790: Converting/displaying HKSCS characters issue on Vista and Windows7
6218752: Update HKSCS and GB18030 converters for Unicode 4.1


Background Info:
HK gov

HKSCS Versions:

0. HKSCS-2008
         The HKSCS-2008 is an updated version of the Hong Kong 
         Character Set-2004 (HKSCS-2004) published in May 2005. It includes
         5,009 characters of which 68 are newly added. The HKSCS-2008 is 
         technically with the ISO/IEC 10646:2003 and its Amendments 1 to 6
         published in October 2009 by the International Organization for
         Standardization (ISO).
1. HKSCS-2004
         denotes the 123 characters that are newly included in the 
2. HKSCS-2001
         denotes the 116 characters that are newly included in the 
3. HKSCS-1999
         denotes characters that are included since the first version   
of the HKSCS that was released in 1999, which contains 4,702 characters.

* HKSCS-2004 and later use Unicode 4.1 code-point/mapping.

JDK currently has two versions of HKSCS charset in its charset repository.
    1. Big5_HKSCS is built on HKSCS-2001, which is used as the default 
charset for Solaris  zh_hk
       locale (there is no indication that Solaris will move on to new 
version anytime soon)
    2. MS950_HKSCS is built on a mixed HKSCS2001/1999 version
       Windows XP claims it's based on 2001 but its mapping table 
suggests actually it's a pre-2001
       version which does not use supplementary characters at all, our 
implementation matches what
       the XP has.
       Vista (and later) now "moves on" to HKSCS2004, it has native 
support (in Unicode only) for
       HKSCSC2004 in its zh_hk locale. This is where the requests of 
upgrading come from.


(1) Support HKSCS2008 in JDK7 (instead of the requested HKSCS2004)
   The good thing about HKSCS2008 is that it only added 68 NEW 
characters into the 2004
   version, so it does not have any compatibility issue, you only get 68 
more mappings, which
   would be "unmappable" in 2004. I don't see any reason to do HKSCS2004 
when the latest
   version is out already. (HK promised this would be the "last" version 
of doing hkscs in Big5
   encoding, from now on, they will ONLY add new characters with Unicode 
code point)

   -Big5_HKSCS and MS950_HKSCS charsets are now based on HKSCS2008.
   -MS950_HKSCS is going to be the default charset for zh_HK locale on 
vista and beyond.

(2) Charset Big5_HKSCS_2001 is HKSCS-2001 based (has exactly the same 
mapping table
    as the current Big5_HKSCS, which is 2001 based implementation). This 
is  going to be the
    default "hkscs" charset for Solaris zh_HK locale (as explained 
above, Solaris  has no plan to
    upgrade for now)

(3) Charset MS950_HKSCS_XP is the mixed-2001/1999 based hkscs for 
Windows XP, this one
   has the same mapping as current MS950_HKSCS. This charset is going to 
be the default
   "hkscs" charset for Windows XP.

(4) We also have
     So to make life easy,
    -removed CharToByte/ByteToCharHKSCS/HKSCS_2001
    -CharToByte/ByteToCharBig5/MS950_HKSCS now base on HKSCS2008
    (a big bonus for users :-)   )

(5) Update the b2c/c2b mapping at sun/nio/cs/mapping to correspond the 
changes in charset.

(6) Make corresponding change in configuration files


Belows are changes are not direct HKSCS related, but since the HKSCS 
charsets are built
on top of the Big5 charset, I included them in this change as well. 
(These changes are
at the bottom of the webrev page, I also have a separate webrev for them 
per Martin's
request, it's here

(7) Migratethe Big5 charset to the "new" mapping based the 
implementation (generate the
    source from the mapping table during build time)

(8) Adjusted the Big5_Solaris to use the new Big5 charset, to build the 
Big5_Solaris tables
    on top of the Big5 tables (which should make the coding faster, with 
the price of a "little"
    extra runtime memory to hold its own tables)

(9) House-clanning in make/tools/src/build/tools/charsetmapping (rename, 
move some piece around)

So now the change is about 50+ files:-)


More information about the core-libs-dev mailing list