RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

David Li david.x.li at oracle.com
Wed Mar 19 23:10:14 UTC 2014


This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 

Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

Existing tests: JAXP SQE and unit tests passed.

Test cases added for typo fix in RangeToken.intersectRanges.  Code also 
updated to fix a bug where regular expression intersection returns 
incorrect value when first range ends later than second range.   Example 
below. Test cases have been added to cover any scenarios that the code 
changes affect.

new RegularExpression("(?[b-d]&[a-r])"); -> returns [b-d] (Correct)
new RegularExpression("(?[a-r]&[b-d])"); -> returns [b-de-r] (Incorrect)


P.S. Notes on bug fixes.
1) Line 404 removal of while loop.
This fixes a new bug where incorrect results are given when first range 
ends later than second range.  In the old code we got
(?[a-r]&[b-d]) -> returns [b-de-r]
By removing the while loop, we get [b-d].
This while loop looks like a copy-paste error from subtractRanges. In 
subtractRanges we need to keep the leftover portion from the first 
range, but this does not apply to intersection.

2) Line 388, addition of src2 += 2;
This code change affects anything of the form (?[a-r]&[b-eg-j]).  The 
code execution is diagrammed below.
o------------o  (src1)
   o--o o--o     (src2)
For the first match we get
o------------o  (src1)
   o--o          (src2)
Next we want to run src2+=2 to get the second pair of endpoints (since 
the first two endpoints are already used).  Notice how src1begin has 
been updated to this.ranges[src1] = src2end+1, which is directly from 
the code.
       o------o  (src1)
        o--o     (src2)
The src2+=2 statement was left out of the old code, and is added in this 
webrev.  If we leave out the src2+=2 at line 388, on the next iteration 
of the large while loop we will reach case "} else if (src2end < 
src1begin) {" which also executes "src2+=2".  This means the correct 
final result is generated, but on a later loop. We want to add the new 
code because it's better to have all associated variable updated in the 
sameloop.  In addition, all the other conditions have similar src1 or 
src2 updates.

More information about the core-libs-dev mailing list