Unicode script support in Regex and Character class
Ulf.Zibis at gmx.de
Tue Apr 27 01:36:28 UTC 2010
Am 24.04.2010 01:09, schrieb Xueming Shen:
> I changed the data file "format" a bit, so now the overal uniName.dat
> is less than 88k (last version is 122+k), but
> the I can no long use cpLen as the capacity for the hashmap. I'm now
> using a hardcoded 20000 for 5.2.
Again, is 88k the compressed or the uncompressed size ?
>> -- Is it faster, first copying the whole date in a byte, and then
>> using ByteBuffer.getInt etc. against directly using DataInputStream
>> -- You could create a very long String with the whole data and then
>> use subString for the individual strings which could share the same
>> backing char.
>> -- I don't think, it's a good idea, holding the whole data in memory,
>> especiallly as String objects; Additionally the backing char's
>> occupy twice the space than a byte
>> -- the big new byte[total] and later the huge amount of String
>> objects could result in OOM error on small VM heap.
>> -- as compromise, you could put the cp->nameOff pointers in a
>> separate not-compressed data file, only hold this in memory, or
>> access it via DirectByteBuffer, and read the string data from
>> separate file only on request from Character.getName(int codePoint).
>> As option, a PreHashMap could cache individual loaded strings.
>> -- Anyway, having DirectByteBuffer access on deflated data would be a
>> performace/footprint gain.
> Sorry, I don't think I fully understand your points here.
See above, the others I try tomorrow.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3571 bytes
Desc: not available
More information about the core-libs-dev