JDK-8215626 : Correct [^..&&..] intersection negation behaviour JDK8 vs JDK11 ??
Roger.Riggs at oracle.com
Mon Jan 7 15:22:15 UTC 2019
Did your investigation lead you to:
6609854: Regex does not match correctly for negative nested character
That might explain when the behavior changed and perhaps why.
On 01/03/2019 06:20 AM, Andrew Leonard wrote:
> I'm currently investigating bug JDK-8215626 and have discovered the
> problem is in the Pattern interpretation of the [^..&&..] negation when
> applied to "intersected" expressions. So I have simplified the bug example
> to a more extreme and obvious example:
> Input string: "1234 ABCDEFG !$%^& abcdefg"
> pattern RegEx: "[^[A-B]&&[^ef]]"
> Operation: pattern.matcher(input).replaceAll("");
> JDK8 output:
> 1234 CDEFG !$%^& abcdefg
> JDK11 output:
> So from the "spec" :
> A character class is a set of characters enclosed within square brackets.
> It specifies the characters that will successfully match a single
> character from a given input string
> To create a single character class matching only the characters common to
> all of its nested classes, use &&, as in [0-9&&].
> To match all characters except those listed, insert the "^" metacharacter
> at the beginning of the character class.
> The way I read the "spec" is the "^" negation negates the whole character
> class within the outer square brackets, thus in this example:
> "[^[A-B]&&[^ef]]" is equivalent to the negation of "[[A-B]&&[^ef]]"
> ie.the negation of the intersect of chars A,B and everything other
> than e,f
> which is thus the negation of A,B
> hence the operation above will remove any character in the input
> string other than A,B
> Hence, JDK11 in my opinion meets the "spec". It looks as though JDK8 is
> applying the ^ negation to just [A-B] and then intersecting it with [^ef],
> which to me is the wrong interpretation of the "spec".
> Your thoughts please?
> If JDK11 is correct, and JDK8 wrong, then the next question is do we fix
> JDK8? as there's obviously potential "behavioural" impacts to existing
> Andrew Leonard
> Java Runtimes Development
> IBM Hursley
> IBM United Kingdom Ltd
> Phone internal: 245913, external: 01962 815913
> internet email: andrew_m_leonard at uk.ibm.com
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
More information about the core-libs-dev