Adding new IBM extended charsets
enasser at in.ibm.com
Tue Jul 24 08:56:50 UTC 2018
Thank you Martin, Sherman and Alan for your valuable inputs.
I have done some initial analysis on the ICU4J. There are some
compatibility issues on the ICU4J charsets with JDK charsets but am more
concerned about its performance as JDK optimization do no exist in that
implementation. I think we need to work with the ICU4J community to
resolve those issues before we remove those charsets from JDK.
The primary reason we are interested to contribute the charsets to openjdk
is that Java users of all locales to get a seamless experience when they
move between openjdk and other implementations. I agree it is good from
footprint and maintenance perspective if we are able to reduce the number
I believe the maintenance effort on the charsets are usually less as we
hardly make any changes to the charsets once developed. Also, the charsets
are usually independent to each other and hence usually will not affect
the Java users unless they are used. As more team members from my team
would like to actively participate in the openjdk community, I hope
maintenance of any issues reported on IBM charsets may not be an issue
going forward. As we discussed before, the footprint issue can be avoided
if we enable the IBM charsets on a need basis with a build flag.
As you advised, we can enable the IBM charsets only for AIX platform by
default and user can enable them on other platforms on a need basis. If
all of you agree, we can start working on moving all IBM charsets from
jdk.charsets to a different module jdk.ibm.charsets and enable them only
for AIX platform by default. We can consider removing them from JDK in
future if community found them as an overhead or not adding value.
From: Alan Bateman <Alan.Bateman at oracle.com>
To: Xueming Shen <xueming.shen at oracle.com>, Nasser Ebrahim
<enasser at in.ibm.com>
Cc: core-libs-dev at openjdk.java.net
Date: 07/19/2018 03:44 PM
Subject: Re: Adding new IBM extended charsets
On 19/07/2018 08:27, Xueming Shen wrote:
> Hi Nasser,
> From openjdk's perspective It would be preferred to direct the develop
> to use the charset
> implementation provided by IBM, or the reliable third party that has
> the appropriate knowledge,
> experience and resource to support/maintain those charsets such as the
> icu4j charset
> project. I have been pulling the data from that huge icu-charset-data
> file and implement/maintain
> them based on my best knowledge, but I'm sure engineers from IBM or
> the icu project probably
> can do a much better job to implement/maintain/update those charsets
> going forward.
> As first step we can separate those IBM charsets from the jdk.charset
> into a separate package
> somewhere and configure them to be built into java.base and
> jdk.charsets, for aix platform only.
> Then we can further discuss the best way to handle/distribute those
> charsets that are not needed
> for the java.base module (for vm startup). As I said, it would be
> ideal if we can remove them from the
> openjdk repo/binaries complete and direct the developer/user to use
> the icu4j charset provider
> for those encodings, when needed. But given the possible compatibility
> concern, we might want to
> phase this work out gradually in next major release.
I agree and in terms of phasing then I don't think it would be too
disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK
12, at least on the main stream platforms. As we've established in this
thread, the ICU4J project does seem to publish its charset provider to
Maven so there are alternatives for applications that really need these
Nasser - do you do any testing with the ICU4J charsets? I quickly tried
62.1 and it seems to work fine on the class path. I didn't check for any
compatibility differences or compare the performance but maybe you have.
It's a bit awkward to test this provider as an automatic module due to
the unusual naming of these JAR files. They may not have looked at
modules yet but the ability to link thee icu4h.charsets module into a
run-time image seems something that people may want to do in the future.
More information about the core-libs-dev