A few char-fiddling optimizations for Pattern.java

Ulf Zibis Ulf.Zibis at gmx.de
Fri Apr 16 08:48:04 UTC 2010

Am 16.04.2010 02:52, schrieb Martin Buchholz:
> On Thu, Apr 15, 2010 at 17:23, Ulf Zibis<Ulf.Zibis at gmx.de>  wrote:
>>> Hi Xueming and Ulf,
>>> Please review a few more optimizations to be added to the
>>> current crop:
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/Pattern-opt/
>> Have you tried to disassemble? :
> No.  Have you?

Can you write a main()-loop, which would trigger the isSurrogate-case 
enough, so I can examine the disassembly.
I've not enough knowledge about Pattern.java.

On a 2nd thought, I think a code point in the surrogate range should be 
rare, as it is kinda invalid. So why check it in 1. place.
What is the purpose of the method?
- isNotValidOrSurrogateBMPCodePoint --> includes negative and values > 
- needsSurrogateTreatment --> only includes surrogate and valid 
supplementary code points
For the 1st case I would code:
     return !Character.isBMPCodePoint(cp) || Character.isSurrogate((char)cp)

>>     private static final boolean isSupplementary(int cp) {
>>          returnCharacter.isSurrogate((char)cp)  ||
>> //  in this case method should be namedisSupplementaryOrHigher():
>> //                cp>= Character.MIN_SUPPLEMENTARY_CODE_POINT);
>>                  Character.isSupplementaryCodePoint(cp);
>>     }
>> Maybe HotSpot would compile to same code.
> That seems unlikely to me.

When HotSpot examines cp to be less than MIN_SURROGATE, why should it 
later examine cp >= MIN_SUPPLEMENTARY_CODE_POINT ?

>> Anyway I think, such functionality should belong to class Character.
> I disagree.

I'm thinking about:
     boolean isBMPCodePoint(int cp, boolean validateSurrogate) {...}

>> And don't forget the 8-space indentation rule for line continuation. ;-)
>> Use Character.codePointCount(...) instead ;-)  :
> Good point!  I deleted countCodePoints,
> especially since it is not used in performance-critical code.

I was wondering, as that was you who had optimized 
Character.codePointCount(...) some days ago.
So why rewriting it on another place.
If countCodePoints has enough call sites, convenient method could reduce 
byte code size but decreases interpreter performance:
     private int countCodePoints(CharSequence chars) {
         return Character.codePointCount(chars, 0, chars.length());


More information about the core-libs-dev mailing list