6990617: Regular expression doesn't match if unicode character next to a digit.
gary.adams at oracle.com
Mon Dec 12 15:05:05 UTC 2011
I think you want to add 6990617 to the @bug tag in RegExTest.
On 12/10/11 01:05 AM, Stephen Flores wrote:
> Please review the following webrev (includes new test to demonstrate the bug):
> for bug: 6990617 Regular expression doesn't match if unicode character next to
> a digit.
> A DESCRIPTION OF THE PROBLEM :
> Unicode characters are represented as \\+number.
> For instance, one could write:
> Pattern p = Pattern.compile("\\011some text\\012");
> Matcher m = p.matcher("\tsome text\n");
> System.out.println(m.find()); // yields "true"
> However, if we want to match a string with a digit next to
> the unicode character, it doesn't match (whether we "quote"
> the regular expression or not). Note the "1" next to the tab
> character (unicode 011).
> Pattern p = Pattern.compile("\\011\\Q1some text\\E\\012");
> Matcher m = p.matcher("\t1some text\n");
> System.out.println(m.find()); // yields "false"
> This happens because Pattern accepts either \\0011 or \\011 for
> the same character. From the javadoc:
> \0nn The character with octal value 0nn (0 <= n <= 7)
> \0mnn The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
> Pattern.RemoveQEQuoting() does NOT handle this situation correctly.
> The existing implementation now simply copies all ASCII.isAlnum()
> characters when handing a quote.
> Description of fix:
> In the method Pattern.RemoveQEQuoting any ASCII digit at the
> beginning of a quote will now be prefixed by "\x3" (not just \
> because this would be a backref). 0x30 is the ASCII code for '0'.
More information about the core-libs-dev