Codereview request for 7189363: Regex Pattern compilation buggy for special sequences

Paul Sandoz paul.sandoz at
Thu Aug 9 08:38:53 UTC 2012

Looks OK to me.


On Aug 8, 2012, at 11:58 PM, Xueming Shen <xueming.shen at> wrote:

> Hi
> It appears the optimization (flatten a binary tree structure into a branch/
> switch) we put into JDK6 for alternation operation [1] has problem if the
> first construct is a group "(...)" followed by a greedy/reluctant "once or not
> at all" quantifier. For example regex "(a)?bc|d" or "(a)??bc|d" can't be found
> for input "d", while "a?bc|d" or "a??bc|d" just works fine.
> The root cause is that expr() mistakenly to use the Branch node from the
> sub-expression (in above case, the branch node for "(a)?" instead of using
> its own Branch node (because the incorrect use of "prev instance of Branch",
> see the diff), which basically messes up the internal regex node tree [2]
> The webrev is at
> Thanks,
> -Sherman
> PS:
> I'm surprised this bug has not been noticed until now. One of the reasons is
> that the "messed up" regex node tree actually works for most of the cases
> for this particular regex, for example
> Pattern.compile("(a)?bc|d").matcher("d").matches()
> Pattern.compile("(a)?bc|d").matcher("bc").matches()
> Pattern.compile("(a)?bc|d").matcher("abc").matches()
> and even
> Pattern.compile("(a)?bc|de").matcher("de").find()
> all return true. The "find()" for single "d" fails only because the "minimum
> length" check (see [2] for details, if interested)
> [1]
> [2]

More information about the core-libs-dev mailing list