<i18n dev> RFR: 8225061: Performance regression in Regex

Claes Redestad claes.redestad at oracle.com
Sat Jun 1 00:58:59 UTC 2019

Hi Naoto,

thanks for reviewing!


On 2019-06-01 02:23, naoto.sato at oracle.com wrote:
> Hi Claes,
> Looks good to me. Thanks for catching this on so quickly!
> Naoto
> On 5/31/19 5:13 PM, Claes Redestad wrote:
>> Hi,
>> recent Unicode 12.1 updates caused a noticeable regression to Mac OS X
>> build times.
>> Quoting Naoto:
>> "The regression was caused by the call to Grapheme.nextBoundary() in
>> NFCCharProperty.match() method, which got slower with the fix to
>> JDK-8221431 / JDK-8222978 (Unicode 12.1 / Grapheme 12.0 support). The
>> purpose of issuing nextBoundary() is to detect whether to call (much
>> heavy weight) Normalizer.normalize() call or not. Since this fast check
>> does not require fully fledged boundary detection, including stateful
>> segmentation check such as Emoji sequence, simply checking the break
>> possibility between two code points as before should suffice. Suggested
>> fix is to bring back the isBoundary(cp1, cp2) method from the previous
>> revision in Grapheme.java, and issue it only from
>> NFCCharProperty.match() method for the fast check."
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8225061
>> Webrev: http://cr.openjdk.java.net/~redestad/8225061/open.01/
>> While narrowing this down, I created a couple of microbenchmarks and
>> experimented with a sequence of optimizations that got the regression of
>> using the heavier nextBoundary() check down from about 300x to just
>> about 2x as costly as before JDK-8221431. These improvements were then
>> bypassed by reverting to isBoundary in some micros, but still helps a
>> lot in other cases that has taken a toll from making the grapheme logic
>> more complete/correct, so I'd like to leave them in.
>> Testing: tier1-3, verified a 300x speedup in the complex
>> Pattern.CANON_EQ micro, and a 2x speedup on the simpler Grapheme/\\b{g}
>> micro.
>> Thanks!
>> /Claes

More information about the i18n-dev mailing list