RFR: JDK-8027645: Pattern.split() with positive lookahead

Xueming Shen xueming.shen at oracle.com
Mon Nov 11 22:05:53 UTC 2013

Alan, Paul,

My apology, it appears I forgot my "fastpath" in String.split(String, int) and
the fact that it also duplicates most of the spec of Pattern.split(). The webrev
has been updated to close the loophole.



On 11/07/2013 10:59 AM, Xueming Shen wrote:
> Hi,
> As suggested in the bug report [1] the spec of j.u.Pattern.split()
> does not clearly specify what the expected behavior should be for scenario
> like a zero-width match is found at the beginning of the input string
> (such as whether or not an empty leading string should be included into
> the resulting array), worse, the implementation is not consistent as well
> (for different input cases, such as "Abc".split(...) vs "AbcEfg".split(...)).
> The spec also is not clear regarding what the expected behavior should be
> if the size of the input string is 0 [2].
> As a reference, Perl.split() function has clear/explicit spec regarding
> above use scenario [3].
> So the proposed change here is to updatethe spec&impl of Pattern.split() to have
> clear specification for above use scanrio, as Perl does
> (1) A zero-length input sequence always results zero-length resulting array
>     (instead of returning a string[] only contains an empty string)
> (2) An empty leading substring is included at the beginning of the resulting
>     array, when there is a positive-width match at the beginning of the input
>     sequence. A zero-width match at the beginning however never produces such
>     empty leading substring.
> webrev:
> http://cr.openjdk.java.net/~sherman/8027645/webrev/
> Thanks!
> -Sherman
> [1] https://bugs.openjdk.java.net/browse/JDK-8027645
> [2] https://bugs.openjdk.java.net/browse/JDK-6559590
> [3] http://perldoc.perl.org/functions/split.html
> btw:the following perl script is used to verify the perl behavior
> ------------------
> $str = "AbcEfgHij";
> @substr = split(/(?=\p{Uppercase})/, $str);
> #$str = "abc efg  hij";
> #@substr = split(/ /, $str);
> print "split[sz=", scalar @substr, "]=[", join(",", @substr), "]\n";
> ------------------

More information about the core-libs-dev mailing list