Unicode script support in Regex and Character class
Ulf.Zibis at gmx.de
Tue Apr 27 01:03:04 UTC 2010
Am 27.04.2010 00:01, schrieb Xueming Shen:
> Ulf Zibis wrote:
>> I would like to see the full names redundantly in the aliases map.
>> Needs only ~100 * (4 + 4) bytes in HashMap<String, Character>.
> This is the implementation details, we can defer the difference for now.
I said that with the alternative of UnicodeScript as _normal class_ in
my head, if saving the redundant internal hash map should matter.
>> I think there should be some more words in the javadoc about
>> correlation/usecase/advantage of UnicodeScript against against
> Martin raised the same comment. But I still believe
> j.l.C.UnicodeScript simply defines the syntax of the Unicode script name
> in the Java libraries, it does not try to interpret/implement anything
> further at semantics level. It just serves as a ID to the
> Unicode script name, so it'd be better to leave the semantics
> definition/explanation to the TR#24.
Yes, for the semantics definition/explanation of Unicode script name,
user should refer to the TR#24.
But he might like to be briefly informed about the different
semantic/usecase/disadvantage of UnicodeBlock
>> I would like to have the 3 special cases INHERITED, COMMON and
>> UNKNOWN together at the beginning or end of the enum list.
> Why? Since the current list is generated by the script from the
> Scripts.txt, it's in the order of what
> they are in the Scripts.txt, any particular reason they should be
> listed differently? We do have the
> links at the beginning already. I don't see any advantage of putting
> them physically together.
Someone might find it useful to code for example
if (script < UnicodeScript.LATIN)
to easily filter the special cases.
Same might be considered for SURROGATE, PRIVATE_USE, UNASSIGNED.
More information about the core-libs-dev