RFR 8176841: Additional Unicode Language-Tag Extensions
naoto.sato at oracle.com
Tue Nov 21 01:45:50 UTC 2017
Hi Stephen, please see my comments below.
On 11/20/17 5:27 AM, Stephen Colebourne wrote:
> I've had a longer think about how to integrate this. Its very tricky,
> as the unicode extensions create complex conflicts.
> In general, my view is that Locale is too complex and overloaded with
> different levels of meaning. Perhaps a different class should have
> been added for more complex localization settings. It seems like an
> are rife for puzzlers/security issues as unicode extensions are going
> to be hugely undertested. (Most developers think of a Locale as
> language+territory, and will only test for that, if they test at all).
> It will be all too easy for text to appear on web pages or reports in
> an unexpected calendar system or time-zone as a result of this change.
> It also means that numbers might be output in an unexpected numbering
> system or currency.
> The tension is between what the developer has thought of and tested
> for, and what a user might pass in. The mechanism for passing in these
> extensions is partly subtle, being via the default locale. Although
> even via an explicit locale, I suspect very few developers will be
> sanitizing a Locale input from a user.
> AFAICT, unicode extensions are not widely used at present. I can only
> find use of "ca" in Chronology.ofLocale() and Calendar.getInstance().
> So this is the key moment for deciding how they are used.
> We are discussing 6 unicode extensions: "ca" (calendar system), "tz"
> (time-zone), "cu" (currency), "nu" (numbering system), "fw" (first day
> of week) and "rg" (region override).
> Chronology, ZoneId, TimeZone and Currency ("ca", "tz", "cu") are not
> really aspects of formatting at all, they are explicit aspects of the
> associated value. eg. a ZonedDateTime explicitly has a chronology of
> IsoChronology and a specific time zone. A Money class (Java will have
> one at some point, probably after value types are added) explicitly
> has a Currency.
> You *really, really, really* don't want a unicode extension locale to
> be changing the format of a Money object - turning $300 to £300 just
> because the user had -u-cu-gbp in their locale. The same applies to
> the formatting of dates and times - the chronology and time-zone are
> part of the value being formatted, not the format.
> This just leaves "nu" (numbering system) where there is a case for the
> unicode extension to be picked up directly. But doing so would cause a
> LocalDate to be output as ????-??-?? (where ? is the digit for another
> numbering system), While I think that should be supported, should it
> be picked up automatically via the locale? No, it should be explicitly
> selected by using DateTimeFormatter.withDecimalStyle() - which the
> webrev does do.
> The distinction here is that DateTimeFormatter (and our theoretical
> MoneyFormatter) are formatting the whole state of the value object,
> whereas DateFormatter and NumberFormatter essentially just format a
> single number, using extra information alongside to help.
> So, what to do? Some of these are new changes, some just need testing:
> 1) The changes to `Calendar.getInstance()` are not documented wrt
> "tz". Currently it says it results in the "default time zone", which
> isn't true anymore. I think this may be a behavioural compatibility
Explicitly document that it will use the time zone in locale.
> 2) `DecimalStyle.ofLocale(Locale)` should use "nu" but does not.
Document it in the javadoc.
> 3) `DateTimeFormatter.localizedBy(Locale)` should use "ca" to call
> `withChronology`, `tz` to call `withZoneId` and `nu` to call
> `withDecimalStyle`. This is a change to the CSR.
Besides that "nu" needs to be spec'ed out, isn't calling withXXXX() an
> 4) The phrase "Unicode extensions in the locale are ignored." would
> apply to lots of methods across the JDK. In particular for this
> webrev, it would need to be added to a lot more methods, such as the
> default locale methods. Perhaps the phrase could be "If the locale has
> unicode extensions, they are not used", which is slightly easier to
> read. Or perhaps it shouldn't be added at all in most cases (ie.
> unless unicode extensions are mentioned, assume that they are not
Removed the phrase.
> 5) The withLocale(Locale) change should say "The locale is stored as
> passed in, without further processing. If the locale has unicode
> extensions, they may be used later in text processing. To set the
> chronology, time-zone and decimal style from unicode extensions, see
> 6) The "rg" extension (and no other extensions) should be used when
> looking up data to output these:
> - DateTimeFormatterBuilder.appendText()
> - DateTimeFormatterBuilder.appendLocalizedOffset()
> - DateTimeFormatterBuilder.appendZoneText()
> - DateTimeFormatterBuilder.appendChronologyText()
> - DateTimeFormatterBuilder.appendLocalized()
> - DateTimeFormatterBuilder.getLocalizedDateTimePattern()
> This may be the case, but tests should ensure it.
Will need to add tests.
> 7) WeekBasedFieldPrinterParser should use "fw"/"rg", which it already
> does via WeekFields.of(Locale)
Not sure what this means. Where is the file located?
> 8) For clarity of future code, two additional methods would be useful:
> - DateTimeFormatterBuilder.toFormatterIsoRoot()
> - DateTimeFormatterBuilder.toFormatterIso(Locale)
> These would set the chronology of the resulting DTF to be
> IsoChronology.INSTANCE. The "root" variant would use `Locale.ROOT`.
This would be a nice addition.
> 9) Consider more generally what happens if TimeZone.getDefault() does
> not match the "tz" extension of Locale.getDefault().
This should be considered case by case, and document on each occasion
which would be honored.
> 10) Consider how localizedBy(Locale) operates. Is it the same as
> or does it only set the chronology if "ca" if found, and only set
> time-zone if "tz" is found and only set decimal style if "nu" is
IIRC, the localizedBy() is added so that withLocale() would behave as it
is now. I think localizedBy() should also have the same effect as
withLocale if the specified locale do not contain any
calendar/timezone/numberingSystem extensions. Otherwise, say
localizedBy(Locale.JAPAN) would be no-operation.
As we discussed in the previous email, we don't have ZoneId.ofLocale() yet.
> Perhaps this webrev should be broken into two or more parts, as it is
> very large?
> In summary, going forward it should be recognised that there is a
> separation/difference between formatters of full values
> (Money/Temporal) and old-style formatters of single numbers
> (DateFormat, NumberFormat). Some extensions, "ca", "tz", "cu", are
> part of the value in Money/Temporal, and so can't be overridden by the
> locale. Methods like Chronology.ofLocale or ZoneId.of(Locale) are good
> places to use the extensions, as they are explicit about what happens.
Here is the spec diff from the previous one. Please let me know your
> On 15 November 2017 at 23:31, Naoto Sato <naoto.sato at oracle.com> wrote:
>> Thanks, Lance. Corrected as suggested.
>> Also, I inserted "@since 10."
>> On 11/15/17 3:06 PM, Lance Andersen wrote:
>>> Hi Naoto
>>> localizedBy, i would suggest changing:
>>> - “If the new locale contains “ca”…” to “if the new locale contains the
>>> - “Unlike withLocale method” to Unlike the withLocale method”
>>>> On Nov 15, 2017, at 5:36 PM, Naoto Sato <naoto.sato at oracle.com
>>>> <mailto:naoto.sato at oracle.com>> wrote:
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>>> Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
More information about the core-libs-dev