<i18n dev> [8] Review request for JEP 127: Improve Locale Data Packaging and Adopt Unicode CLDR Data

Naoto Sato naoto.sato at oracle.com
Tue Aug 14 14:04:09 PDT 2012

On 8/14/12 11:43 AM, Steven R. Loomis wrote:
> On Tue, Aug 14, 2012 at 10:53 AM, Naoto Sato <naoto.sato at oracle.com
> <mailto:naoto.sato at oracle.com>> wrote:
>     Hi Steven,
>     I'll leave the implementation part discussion of the parser to
>     Masayoshi, but one of the main reasons we used the internally
>     existing parser was mainly the adaptation work that would be
>     required to port CLDR's parser into the JDK. In this regard, I
>     briefly had a chat with Yoshito a while ago, and he mentioned
>     someone was working on a filter mechanism on CLDR tool that could
>     emit JDK style format. That may be promising and worth considering
>     in the future release of JDK.
> Thanks. Is there a need to put the parser itself (and data) into the JDK
> ? I noticed some discussion of licensing.  Would it not be sufficient to
> ensure that the CLDR tools could generate JDK format data, and check the
> output of that into the JDK?  Then you wouldn't need the xml itself in
> the JDK (other than, of course, storing it for your own archival purposes)

I think the main reason to include the XML data and converter tool in 
the JDK build tree is the integrity. Otherwise the result from different 
CLDR data could generate different output from the Java APIs. JDK has 
been including the Unicode Character Database for a particular version 
per JDK release for the same reason.


> Thanks,
> Steven
>     Naoto
>     On 8/13/12 10:25 PM, Steven R. Loomis wrote:
>         Naoto,
>            okay, thought I was done for the night, but just two more
>         things..
>         - again on the "talk to us" category.. Sun already wrote one LDML
>         converter, and contributed to another. They're part of the CLDR
>         toolset and
>         work with OOo and Solaris data.
>         - also, it appears that the new converter doesn't handle aliases
>         at all, or
>         parentLocales. You're guaranteed to get the wrong answer.
>         - Some of the processing (such as for Norwegian) and in other
>         places seems
>         to be very .. hardcoded and fragile.
>         - Are you aware of the fact that CLDR 22 is nearly released? Has
>         there been
>         any testing with the interim data, or any plans to do so?
>         I think the summary again is, talk to us.  Where "us" is the
>         CLDR technical
>         committee.
>         Regards,
>         Steven
>         On Mon, Aug 13, 2012 at 9:58 PM, Steven R. Loomis
>         <srl at icu-project.org <mailto:srl at icu-project.org>>wrote:
>             Hello,
>                Some questions,
>                - Is there a reason that a new parser was written, rather
>             than leverage
>             the existing CLDR tools (which are themselves written in
>             Java)?  (I've
>             already suggested discussion with the CLDR-TC.. I know I've
>             been personally
>             more than a bit sparse, but, you know where we 'live')
>                - It's incorrect to specifically open, for example,
>             common/supplemental/__numberingSystems.xml
>             ( NUMBERING_SOURCE_FILE ) . You should not rely on the specific
>             organization. See Appendix C of TR35, however, I filed
>             http://unicode.org/cldr/trac/__ticket/5189
>             <http://unicode.org/cldr/trac/ticket/5189> to clarify the
>             situation.
>             More later when I get a chance, but definitely good work here.
>             Steven
>             -----
>             Subject:
>             To: i18n-dev <i18n-dev at openjdk.java.net
>             <mailto:i18n-dev at openjdk.java.net>>,       Java Core Libs
>                       <core-libs-dev at openjdk.java.__net
>             <mailto:core-libs-dev at openjdk.java.net>>,
>             build-dev at openjdk.java.net <mailto:build-dev at openjdk.java.net>
>             Message-ID: <4FFC93CF.40105 at oracle.com
>             <mailto:4FFC93CF.40105 at oracle.com>>
>             Content-Type: text/plain; charset=UTF-8; format=flowed
>             Hello,
>             Please review the JDK8 changes for JEP 127: Improve Locale Data
>             Packaging and Adopt Unicode CLDR Data
>             (http://openjdk.java.net/jeps/__127
>             <http://openjdk.java.net/jeps/127>). The webrev is located at:
>             http://cr.openjdk.java.net/~__naoto/6336885/webrev.00/
>             <http://cr.openjdk.java.net/~naoto/6336885/webrev.00/>

More information about the i18n-dev mailing list