Adding new IBM extended charsets
xueming.shen at oracle.com
Thu Jul 19 07:27:37 UTC 2018
From openjdk's perspective It would be preferred to direct the develop
to use the charset
implementation provided by IBM, or the reliable third party that has the
experience and resource to support/maintain those charsets such as the
project. I have been pulling the data from that huge icu-charset-data
file and implement/maintain
them based on my best knowledge, but I'm sure engineers from IBM or the
icu project probably
can do a much better job to implement/maintain/update those charsets
As first step we can separate those IBM charsets from the jdk.charset
into a separate package
somewhere and configure them to be built into java.base and
jdk.charsets, for aix platform only.
Then we can further discuss the best way to handle/distribute those
charsets that are not needed
for the java.base module (for vm startup). As I said, it would be ideal
if we can remove them from the
openjdk repo/binaries complete and direct the developer/user to use the
icu4j charset provider
for those encodings, when needed. But given the possible compatibility
concern, we might want to
phase this work out gradually in next major release.
On 7/17/18, 6:48 AM, Nasser Ebrahim wrote:
> Hi Alan,
> Thank you for your inputs. I would like to clarify that all the IBM
> charsets (IBMXXXX) in jdk.charsets are not IBM platform specific
> charsets. For example, only 43 charsets out of 72 IBMXXXX in
> jdk.charsets are EBCDIC or IBM platform specific charsets. Similarly,
> many charsets in the list of 75 charsets which we would like to
> contribute are not EBCDIC charsets.
> I feel we should have a standard guideline for the extended charsets.
> If we are keeping the extended charsets in the JDK, then we may want
> to consider all ICU/IANA approved charsets in JDK. Otherwise, we may
> want to keep only the standard charsets in JDK and remove all the
> extended charsets so that all extended charsets can be taken from
> third party libraries like ICU4J.
> If we decided to keep the extended charsets, then may be we can
> classify the extended charsets as ASCII and EBCDIC and the
> corresponding modules as jdk.ascii.charset and jdk.ebcdic.charset.
> Then, depends upon the platform, we can consider including either of
> the charset module or both.
> Please advise.
> Thank you,
> Nasser Ebrahim
> From: Alan Bateman <Alan.Bateman at oracle.com>
> To: Nasser Ebrahim <enasser at in.ibm.com>, Xueming Shen
> <xueming.shen at oracle.com>, core-libs-dev at openjdk.java.net
> Date: 07/09/2018 01:25 AM
> Subject: Re: Adding new IBM extended charsets
> On 06/07/2018 14:56, Nasser Ebrahim wrote:
> > :
> > I understood you preferred option is 3 [Remove all extended charsets
> > JDK (keep only default charsets) and use the extended charsets from
> > party like ICU4J]. Just to confirm, so you meant we need to keep
> only the
> > standard charsets in the JDK and remove all the extended charsets
> from JDK
> > and use them from ICU4J OR you meant apply that only for the new
> > charsets. I think it is better to keep the consistency - either take all
> > extended charsets from ICU4J or maintain all extended charsets with JDK.
> > Keeping some extended charsets within JDK and use ICU4J for other
> > charsets may confuse the Java user.
> I think the suggestion in Sherman's mail is to drop the 70 or so IBM
> charsets from jdk.charsets. This will reduce the size of jdk.charsets
> and eliminate the need to maintain these charsets (at least on non-AIX
> builds). If developers need these charsets, say when connecting to
> database on an IBM system, then they can deploy the ICU4J provider on
> the class path or module path.
> I don't think the suggestion impacts the 11 IBM charsets in java.base on
> non-AIX builds or the non-IBM charsets in jdk.charsets. They may be
> opportunities to drop some of these but that can be looked at separately.
> Also I don't think the suggestion impacts the additional 12 IBM charsets
> that are included in the AIX build of java.base at this time. From the
> review threads, it seems there are supported locales on AIX that map to
> these charsets so this is why they are in java.base.
More information about the core-libs-dev