RFR 8134512 : provide Alpha-Numeric (logical) Comparator
joe.darcy at oracle.com
Fri Jul 28 16:48:09 UTC 2017
A few comments.
I don't have a specific suggestion for a different name, but I don't
think "comparingNumerically" does an adequate job capturing the
described behavior of the method. Likewise, the summary of the methods'
behavior in the javadoc should have a more suggestive description of the
Please add comments describing what the "// Fullwidth characters" values
are in the test.
Interesting test cases would also include substrings for
Integer.MAX_VALUE, Integer.MAX_VALUE+1, Long.MAX_VALUE, Long.MAX_VALUE +
1, etc. Test cases of purely numerical and purely alphabetical inputs
would be warranted too.
While randomizing the array and sorting is a valid test, a more through
test would do pair-wise comparisons of each array element. While such
comparisons are quadratic with array length, the arrays here are small.
On 7/28/2017 9:03 AM, Ivan Gerasimov wrote:
> Hi Roger!
> On 7/24/17 7:42 AM, Roger Riggs wrote:
>> Hi Ivan,
>> Thanks for bringing this up again.
>> Some initial comments, not a full review.
>> Though the enhancement says that no consideration is given to sign
>> characters they
>> may produce puzzling results for strings like "-2000" and "-2001"
>> where the strings appear
>> to be signed numbers but the comparison will be for the "-" prefix
>> and the rest unsigned.
>> Including the word 'unsigned' in the description might help reinforce
>> the semantics.
> Yes, it's a good point. I'll add some words to make it certain we
> only recognize unsigned numbers.
> Otherwise, it would become confusing when comparing something like
> "2017-07-28" and "2017-07-29".
>> Also, I don't see test cases for strings like: "abc200123" and
>> "abc20000123" which share
>> a prefix part of which is numeric but differ in the number of
>> "leading" zeros after the prefix.
> Sure. Good addition to the test.
>> What about strings that include more than 1 numeric segment:
>> abc2003def0001 and abc02003def001
>> in the leading zero case?
> With the first comparator (which treats the numbers with more leading
> zeroes as greater ones), these strings would be sorted as
> "abc2003def0001" < "abc02003def001".
> The logic here is as following:
> 1) skip common prefix "abc",
> 2) find the numerical parts: "2003" and "02003",
> 3) after skipping leading zeroes, they are compared to be equal,
> 4) then, the string with more leading zeroes is said to be greater.
>> Though the test adds the @key randomness, it would be useful to use
>> the test library
>> mechanisms to report the seed and be able to run the test with a
>> seed, if case they fail.
>> (As might be provided by the test library jdk.test.lib.RandomFactory).
> Okay, I'll add this Random to the shuffling to make it reproducible.
>> 576: "the common prefix is skipped" is problematic when considering
>> leading zeros.
>> The common prefix may contain non-zero digits.
> Yes, of course.
> While scanning the string for the common prefix, we still keep track
> of the numeric part.
> Probably "skip" is a wrong word here.
>> 585: it is not clear whether the "different number of leading zeros"
>> is applied regardless
>> of the common prefix or only starting after the common prefix.
> When talking about leading zeroes, then the entire numerical substring
> is meant.
> Part of this substring can belong to the common prefix.
>> 550, 586: for many readers, it is easier to read 'for example' than
>> "e.g." or "i.e.".
> Thanks! I'll change it accordingly.
>> 562, 598: editorial: "is to compare" -> "compares"
>> 192: @param for param o is missing; (the param name "o" usually
>> refers to an object, not a string).
>> 200: Can skipLeadingZeros be coded to correctly work if cnt == 0;
>> it would be more robust
>> SkipLeading zeros works correctly only if pos is given the first
>> numeric digit in the subsequence
>> so the numStart1 and numStart2 must be first digit in each string.
> I don't think that skipLeadingZeros() is very specific that it always
> called for longer string, trying to reduce its size via skipping
> leading zeroes.
> It wouldn't make sense to call it with cnt == 0, and so we can avoid
> one initial comparing and save a couple of nanoseconds :)
> I added this prerequisite to the javadoc of the method, so hopefully
> it shouldn't cause much confusion.
>> Line 223: if strings typically have non-numeric prefixes, it might
>> perform slightly better
>> to detect the end of the common prefix and then scan back to find the
>> beginning of the numeric part.
>> (Avoiding checking every char for isDigit).
> On the other hand, we're saving on not calling String.charAt() for the
> second time :)
>> Line 224: If assigned for every digit, it will hold the last equal
>> digit in the common prefix, not the first digit.
> It will only be assigned when a non-digit is met.
> And since the index was just incremented @ Line 222, numStart1 will be
> set to the index of the first digit inside the common prefix.
> For example, if the common prefix is abc123, then the numStart1 will
> be updated to 3 when looking at the char 'c'. Last three cycles of
> the loop it won't be updated, since all the trailing chars are digits.
>> 239, 240: chars at o1(index) and o2(index) are already known to be
>> digits, can it be avoided checking them twice
>> by starting at index+1?
> Not quite necessarily. We can be here due to numLen1 > 0, and *only
> one* or *none* of the string remainders start with digits.
> In the later case we'll end up @ line 279.
> Here's the updated webrev:
> Given the further discussion this iteration is probably not the final,
> but it's better to have a checkpoint :)
> With kind regards,
>> $.02, Roger
>> On 7/19/2017 4:41 AM, Ivan Gerasimov wrote:
>>> It is a proposal to provide a String comparator, which will pay
>>> attention to the numbers embedded into the strings (should they
>>> This proposal was initially discussed back in 2014 and seemed to
>>> bring some interest from the community:
>>> In the latest webrev two methods are added to the public API:
>>> j.u.Comparator.comparingNumerically() and
>>> The regression test is extended to exercise this new comparator.
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8134512
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8134512/01/webrev/
>>> Comments, suggestions are very welcome!
More information about the core-libs-dev