[10] RFR 8176841: Additional Unicode Language-Tag Extensions

Stephen Colebourne scolebourne at joda.org
Mon Nov 20 13:27:38 UTC 2017

I've had a longer think about how to integrate this. Its very tricky,
as the unicode extensions create complex conflicts.

In general, my view is that Locale is too complex and overloaded with
different levels of meaning. Perhaps a different class should have
been added for more complex localization settings. It seems like an
are rife for puzzlers/security issues as unicode extensions are going
to be hugely undertested. (Most developers think of a Locale as
language+territory, and will only test for that, if they test at all).
It will be all too easy for text to appear on web pages or reports in
an unexpected calendar system or time-zone as a result of this change.
It also means that numbers might be output in an unexpected numbering
system or currency.

The tension is between what the developer has thought of and tested
for, and what a user might pass in. The mechanism for passing in these
extensions is partly subtle, being via the default locale. Although
even via an explicit locale, I suspect very few developers will be
sanitizing a Locale input from a user.

AFAICT, unicode extensions are not widely used at present. I can only
find use of "ca" in Chronology.ofLocale() and Calendar.getInstance().
So this is the key moment for deciding how they are used.

We are discussing 6 unicode extensions: "ca" (calendar system), "tz"
(time-zone), "cu" (currency), "nu" (numbering system), "fw" (first day
of week) and "rg" (region override).

Chronology, ZoneId, TimeZone and Currency ("ca", "tz", "cu") are not
really aspects of formatting at all, they are explicit aspects of the
associated value. eg. a ZonedDateTime explicitly has a chronology of
IsoChronology and a specific time zone. A Money class (Java will have
one at some point, probably after value types are added) explicitly
has a Currency.

You *really, really, really* don't want a unicode extension locale to
be changing the format of a Money object - turning $300 to £300 just
because the user had -u-cu-gbp in their locale. The same applies to
the formatting of dates and times - the chronology and time-zone are
part of the value being formatted, not the format.

This just leaves "nu" (numbering system) where there is a case for the
unicode extension to be picked up directly. But doing so would cause a
LocalDate to be output as ????-??-?? (where ? is the digit for another
numbering system), While I think that should be supported, should it
be picked up automatically via the locale? No, it should be explicitly
selected by using DateTimeFormatter.withDecimalStyle() - which the
webrev does do.

The distinction here is that DateTimeFormatter (and our theoretical
MoneyFormatter) are formatting the whole state of the value object,
whereas DateFormatter and NumberFormatter essentially just format a
single number, using extra information alongside to help.

So, what to do? Some of these are new changes, some just need testing:

1) The changes to `Calendar.getInstance()` are not documented wrt
"tz". Currently it says it results in the "default time zone", which
isn't true anymore. I think this may be a behavioural compatibility

2) `DecimalStyle.ofLocale(Locale)` should use "nu" but does not.

3) `DateTimeFormatter.localizedBy(Locale)` should use "ca" to call
`withChronology`, `tz` to call `withZoneId` and `nu` to call
`withDecimalStyle`. This is a change to the CSR.

4) The phrase "Unicode extensions in the locale are ignored." would
apply to lots of methods across the JDK. In particular for this
webrev, it would need to be added to a lot more methods, such as the
default locale methods. Perhaps the phrase could be "If the locale has
unicode extensions, they are not used", which is slightly easier to
read. Or perhaps it shouldn't be added at all in most cases (ie.
unless unicode extensions are mentioned, assume that they are not

5) The withLocale(Locale) change should say "The locale is stored as
passed in, without further processing. If the locale has unicode
extensions, they may be used later in text processing. To set the
chronology, time-zone and decimal style from unicode extensions, see

6) The "rg" extension (and no other extensions) should be used when
looking up data to output these:
- DateTimeFormatterBuilder.appendText()
- DateTimeFormatterBuilder.appendLocalizedOffset()
- DateTimeFormatterBuilder.appendZoneText()
- DateTimeFormatterBuilder.appendChronologyText()
- DateTimeFormatterBuilder.appendLocalized()
- DateTimeFormatterBuilder.getLocalizedDateTimePattern()
This may be the case, but tests should ensure it.

7) WeekBasedFieldPrinterParser should use "fw"/"rg", which it already
does via WeekFields.of(Locale)

8) For clarity of future code, two additional methods would be useful:
- DateTimeFormatterBuilder.toFormatterIsoRoot()
- DateTimeFormatterBuilder.toFormatterIso(Locale)
These would set the chronology of the resulting DTF to be
IsoChronology.INSTANCE. The "root" variant would use `Locale.ROOT`.

9) Consider more generally what happens if TimeZone.getDefault() does
not match the "tz" extension of Locale.getDefault().

10) Consider how localizedBy(Locale) operates. Is it the same as
or does it only set the chronology if "ca" if found, and only set
time-zone if "tz" is found and only set decimal style if "nu" is

Perhaps this webrev should be broken into two or more parts, as it is
very large?

In summary, going forward it should be recognised that there is a
separation/difference between formatters of full values
(Money/Temporal) and old-style formatters of single numbers
(DateFormat, NumberFormat). Some extensions, "ca", "tz", "cu", are
part of the value in Money/Temporal, and so can't be overridden by the
locale. Methods like Chronology.ofLocale or ZoneId.of(Locale) are good
places to use the extensions, as they are explicit about what happens.


On 15 November 2017 at 23:31, Naoto Sato <naoto.sato at oracle.com> wrote:
> Thanks, Lance. Corrected as suggested.
> http://cr.openjdk.java.net/~naoto/8191349/specdiff.02/overview-summary.html
> Also, I inserted "@since 10."
> Naoto
> On 11/15/17 3:06 PM, Lance Andersen wrote:
>> Hi Naoto
>> localizedBy, i would suggest changing:
>> -   “If the new locale contains “ca”…”  to “if the new locale contains the
>> “ca”…”
>> - “Unlike withLocale method” to Unlike the withLocale method”
>> Best
>> Lance
>>> On Nov 15, 2017, at 5:36 PM, Naoto Sato <naoto.sato at oracle.com
>>> <mailto:naoto.sato at oracle.com>> wrote:
>>> http://cr.openjdk.java.net/~naoto/8191349/specdiff.01/overview-summary.html
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
>> Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>

More information about the core-libs-dev mailing list