RFR JDK-8147531,To add named character construct \N{...} to support Unicode name property

Xueming Shen xueming.shen at oracle.com
Fri Jan 22 03:44:12 UTC 2016

On 1/19/16 11:43 AM, Martin Buchholz wrote:
> Many years ago I considered implementing this cool feature.
> I thought that few would find it worth the cost - it would be hard to
> keep the cost low if this feature is used only rarely.  You might want
> an expiring cache of character name mappings, and the JDK doesn't have
> such a thing yet.

As a matter of fact. The compressed data file is about 130k in the file 
system. The
inflated runtime data for the name string table is about 700k. The 
cp->name lookup
table is about 160k and the name->cp lookup mapping is about 400k+ 
(there might
be a little more space can be cut from the homemade hashmap...). So the 
runtime cost is about 1.2mb for this "cool" feature. Yes, it's a little 
bigger than the
zt_tw charset, but consider you can have a round trip mapping between 
all the
codepoints and their names, 1.3mb might not be that expensive, consider 
a "normal"
pic now takes couple mb memory.

How about you help take a look to see if we can squeeze out more space? 
really need
a reviewer :-)


> (I haven't actually reviewed the implementation)

> On Mon, Jan 18, 2016 at 11:52 PM, Xueming Shen <xueming.shen at oracle.com> wrote:
>> Hi,
>> Please help review the change to add \N support in regex.
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8147531
>> webrev: http://cr.openjdk.java.net/~sherman/8147531/webrev
>> This is one of the items we were planning to address via JEP111
>> http://openjdk.java.net/jeps/111
>> https://bugs.openjdk.java.net/browse/JDK-8046101
>> Some of the constructs had been added already in early release. I'm
>> planning to address the rest as individual rfe separately.
>> Thanks,
>> Sherman

More information about the core-libs-dev mailing list