[loc-en-dev] -u- extension API - necessary updates?
y.umaoka at gmail.com
Wed Jun 30 13:30:24 PDT 2010
In the Locale Enhancement repository, we have following proposed APIs
supporting -u- extension:
public Set<String> getUnicodeLocaleKeys()
public String getUnicodeLocaleType(String key)
public Builder setUnicodeLocaleKeyword(String key, String type)
Following Unicode locale extension are not in our scope last year.
1. type represented by multiple subtags
2. key without type
For supporting 1, it looks we do not need any changes in the proposal.
A Unicode locale extension keyword may have type represented by multiple
subtags. For example, "en-u-vt-0061-0065" is a valid example defined by
the current LDML specification (See
However, this does not mean that a keyword may have multiple types. In
this example, 0061 and 0065 are not two different types - instead
"0061-0065" is a type. Thus, getUnicodeLocaleType("vt") can simply
return "0061-0065". To set the type using Builder,
setUnicodeLocaleKeyword("vt", "0061-0065") is sufficient.
For supporting 2, there is a minor conflict with the current proposal.
Assume we have a Locale represented by pseudo language tag
"en-u-aa-bb-ccc". getUnicodeLocaleKeys() will return a set containing
"aa" and "bb". getUnicodeLocaleType(String key) currently returns null
when the input key is not available, and it returns non-empty type
string when the key is available. We could use empty string "" to
represent typeless keyword - that is, getUnicodeLocaleType("aa") to
return "" in this example.
The remaining question is the Builder API -
setUnicodeLocaleKeyword(String key, String type). For now, empty string
type indicate that the keyword itself is removed from the current state
and null type throws NPE. We could change the API to use null for
deletion instead of empty string. For example, if an Builder internally
represents "en-u-aa-bb-ccc", setUnicodeLocaleKeyword("aa", null) will
remove the typeless keyword "aa" - and internal representation will be
changed to "en-u-bb-ccc" after the call. Also,
setUnicodeLocaleKeyword("dd", "") will append a typeless keyword "dd" to
the internal state (that is, "en-u-aa-bb-ccc-dd").
Note that setXXX with empty string is removing a field from Builder by
the current design. If we really want to change the semantics of empty
string and null in the API setUnicodeLocaleKeyword, the consistent
policy should be applied to others (for example, setLanguage(null) to
remove language field, instead of setLanguage("")).
For supporting 3, we could treat an attribute as keyless keyword. But it
makes getUnicodeLocaleKeys()/getUnicodeLocaleType(String key) a little
bit awkward. Technically, we can still design them like that way
(getUnicodeLocaleKeys() to include an empty string in the return set /
getUnicodeLocaleType("") to return attribute subtags). I think adding
extra API dedicated for attribute is cleaner.
public Set<String> getUnicodeLocaleAttributes()
The same idea is applicable to Builder. The API dedicated for
adding/removing Unicode locale attribute like below may be added:
public Builder addUnicodeLocaleAttribute(String attribute)
public Builder removeUnicodeLocaleAttribute(String attribute)
Another possibility is to multiple attributes as a whole.
public Builder setUnicodeLocaleAttribute(String attributes)
For example, setting attribute "abc" and "def",
setUnicodeLocaleAttributes("abc-def"). If we go for this approach, we do
not need "remove" method. A tricky part is that the order of attributes
does not matter. So, semantically, "abc-def" and "def-abc" are same. We
do not want to introduce unnecessary variations, we should clearly state
that the order of attributes are not preserved.
Another question related to this - Set<String> vs. List<String>.
Currently, getUnicodeLocaleKeys() returns Set<String> (actually,
unmodifiable set). Semantically, the order of keywords does not matter.
"u-ca-japanese-cu-jpy" is equivalent to "u-cu-jpy-ca-japanese". But we
do use canonical order (alphabetical order of keys) when a Locale is
converted to a language tag. From this point of view, List<String> might
be more appropriate. This also applies to attributes. If we agree to
support Unicode locale attributes with dedicated APIs like above, we
should decide if the collection of attributes should be represented by
Set or List.
Overall, supporting full specification of Unicode locale extension looks
not too bad. Some may argue why we add APIs dedicated for things which
are not yet used. We could defer adding "attribute" APIs - and attribute
can be only set via Builder.setExtension('u', "...."). But necessary API
addition is pretty minimal and with these APIs, the design look more
complete. Therefore, if we are going to include any 'u' extension
specific APIs, I want to do it completely including attribute support.
More information about the locale-enhancement-dev