<i18n dev> RFR: 8235834 IBM-943 charset encoder needs updating
naoto.sato at oracle.com
naoto.sato at oracle.com
Thu Feb 20 17:51:22 UTC 2020
Thanks for the explanation. Here are comments to your fix:
- Can you add some comments about the gist of the change to IBM943.c2b?
Summarizing the explanation below would be fine.
For the test case:
- Please wrap the @bug line appropriately.
- Those byte array/Strings can be "static final" outside the test method.
- Looks like the mapping is exactly the same with IBM943C for those code
points. Is that correct? If so, would you please add some comment as such?
- Typo: "round-tip" -> "round-trip"
On 2/20/20 2:47 AM, Ichiroh Takiguchi wrote:
> Hello Naoto.
> I appreciate your comment.
> The definition has not changed recently.
> Applying the change will prevent the characters which are used on
> Japanese PC from being converted to "?".
> I checked IBM-943 definition and changelog.
> Definitions and creation date are as follows:
> (All definitions are valid)
> 03AF34B0.TPMAP120: May 20 1997 （Current OpenJDK)
> 34B003AF.RPMAP130: May 20 1997 (Proposed change, upward compatible)
> 34B003AF.RPMAP14A: Jul 29 1998 (Same as 34B003AF.RPMAP130)
> 34B003AF.RPMAP15A: Jan 8 2003 (Additionally, 13 characters are changed)
> According to IBM943.map, OpenJDK JDK refers 03AF34B0.TPMAP120.
> 03AF34B0.TPMAP120 just has B2C conversion table only, no C2B definition.
> 34B003AF.RPMAP130 which has C2B definition was released on same date.
> I assume C2B definition was not implemented at that time.
> C2B definition for 34B003AF.RPMAP130 and 34B003AF.RPMAP14A are same,
> only replacement character is changed.
> 34B003AF.RPMAP15A is the latest, but it's almost same as MS932.
> If 34B003AF.RPMAP15A is applied, 03AF34B0.TPMAP14A is also required.
> I'd like to add C2B definition without B2C definition because of
> I don't want to apply 03AF34B0.TPMAP14A B2C definition.
> So I'd like to apply 34B003AF.RPMAP130 definition.
> Ichiroh Takiguchi
> On 2020-02-19 07:33, naoto.sato at oracle.com wrote:
>> Hi Takiguchi-san,
>> Can you please elaborate the rationale for the change? It looks like
>> IBM943 chaset hasn't changed for a long time, at least from JDK8. Has
>> the mapping definition recently changed?
>> On 2/17/20 10:23 AM, Ichiroh Takiguchi wrote:
>>> Could you review the fix ?
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8235834
>>> Change: https://cr.openjdk.java.net/~itakiguchi/8235834/webrev.00/
>>> IBM-943 is for IBM Japanese PC encoding.
>>> MS932 is for Microsoft Japanese Windows encoding.
>>> IBM-943 charset encoder does not contain 5 compatible entries
>>> compared to MS932 charset.
>>> Ichiroh Takiguchi
>>> IBM Japan, Ltd.
More information about the i18n-dev