RFR 8230365 : Pattern for a control-char matches non-control characters

Ivan Gerasimov ivan.gerasimov at oracle.com
Thu Aug 29 23:39:35 UTC 2019


In a regular expression pattern a sequence of the form \\cx is allowed 
to specify a control character that corresponds to the name char x.

Current implementation has a few issues with that:
1)  It allows x to be just any character, including non-printable ones;
2)  The produced regexp may correspond to a non-control characters;
3)  The expression is case-sensitive, so, for example \\cA differs from 
\\ca, while they both have to produce ctrl-A.

It is proposed to make parsing more strict and reject invalid values of 
x, and also clarify the documentation to explicitly list valid values of x.

If we agree on this proposal, then a CSR will probably need to be filed 
to capture the changes in the regexp parsing.

Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230365
WEBREV: http://cr.openjdk.java.net/~igerasim/8230365/00/webrev/

With kind regards,
Ivan Gerasimov

More information about the core-libs-dev mailing list