Unicode script support in Regex and Character class

Xueming Shen xueming.shen at oracle.com
Thu Apr 22 08:01:20 UTC 2010


Here is the webrev of the proposal to add Unicode script support in 
regex and j.l.Character.


and the corresponding blenderrev


Please comment on the APIs before I submit the CCC, especially

(1) to use enum for the j.l.Character.UnicodeScript (compared to the 
traditional j.l.c.Subset)
(2) the piggyback method j.l.c.getName() :-)
(3) the syntax for script constructs. In addition to the "normal"
     \p{InScriptName} and \P{InScriptName} for the script support
     I'm also adding
    \p{script=ScriptName} \P{script=ScriptName}  for the new script support
    \p{block=BlockName} \P{block=BlockName}  for the "existing" block 
    \p{general_category=CategoryName} \P{general_category=CategoryName} 
for the "existing" gc
    Perl recently also started to accept this  \p{propName=propValue} 
Unicode style.
    It opens the door for future "expanding", for example \p{name=XYZ} :-)
(4)and of course, the wording.


More information about the core-libs-dev mailing list