Codereview request for 6653797: Reimplement JDK charset repository charsets.jar
xueming.shen at oracle.com
Mon Jul 16 17:13:22 UTC 2012
On 7/16/2012 9:57 AM, Ulf Zibis wrote:
> Hi Sherman,
> as I just said for 7183053, I can't look in the details at the moment,
> as I do not have suitable environment installed at the moment.
> All I can see, looks reasonable.
> Regarding part 4 of bug 6653797, there is still existing adaptor from
> my side, if desired.
The sun.io has been removed. That will be an alternative if we hear any
> Just one comment: I think it should be possible to share the mapping
> data partly across charsets, so the charsets.jar would be decreased
> again more?
> Am 16.07.2012 00:12, schrieb Xueming Shen:
>> This changeset includes the migration of our JIS0201/0208/0212 based
>> double-byte charsets to the new mapping based implementation. This is
>> left-over of the effort we put in JDK7 to re-implement most of our
>> charsets in
>> charsets.jar to (1)have better performance (2) small storage foot
>> print and (3)
>> ease the maintenance burden.
>> Notes of the implementation:
>> (1) jis0201/0208/0212 and their variants are now generated from the
>> mapping table
>> during the build time. (See those new .map *.nr and *.c2b tables)
>> (2) EUC_JP/LINUX_OPEN, SJIS, PCK, ISO2022_JP and its variants are now
>> using these
>> new jis0201/02080212 charsets.
>> (3) Those in red (in webrev) are the old implementation, since no
>> charset uses them
>> anymore, I removed them from the repository)
>> (4) There are two approaches for PCK and SJIS. PCK.java_v1 and
>> SJIS.java_v1 are the
>> one that follows the old implementation, which decode/encodes base on
>> jis0201/0208 (and the variants) mapping via Ken's algorithm. This is
>> known to be
>> slow and buggy (the algothrim does not take care of illegal sjis cp,
>> see #6653797
>> and http://cr.openjdk.java.net/~sherman/6653797/Sjis2Jis.java)
>> So in this charset, I generated the direct mapping tables for sjis
>> and pck and use
>> the "general" DoubleByte base class for these two charsets. This
>> results in much
>> faster decoding/encoding and correct mapping for all code points. The
>> of this approach is that it adds about 50k uncompressed side to the
>> But given this change actually reduces about 300K from the rt.jar, we
>> still get
>> a net 250K, so I decided to go with this approach for better
>> It appears to be lots of files (80+) in the webrev, but that number
>> includes the
>> removed old implementation and the tests I put in to guarantee the
>> de/encoding result from the old and new implementations (those OLD...
>> cases), the change is actually not that big:-) So please help review.
>> I can then
>> put this multi-year efforts into rest.
More information about the core-libs-dev