New portion of improvements for Dual-Pivot Quicksort

Wed May 5 17:06:00 UTC 2010

```Josh,

Thank you very much for review.
I'll prepare updated version and send it with my comments soon.

Joshua Bloch wrote:
>
> Hi. I'm thrilled that you were able to eke so much more perforance out
> Pittsburgh. Here's the code review:
>
> I think the comment on lines 102-105 was too useful to delete:
>
>  102        ...  This
>  103      * method differs from the public {@code sort} method in that the
>  104      * {@code right} index is inclusive, and it does no range checking
>  105      * on {@code left} or {@code right}.
>
> The sentinel technique that you use in lines 118 - 136 is questionable:
> you are modifying a portion of the array outside the specified range of
> the call, which arguably violates the contract of the call, and could be
> observed in a multithreaded program. It's not beyond the realm of reason
> that it could break existing clients. I will discuss it with Doug Lea
> and let you know what he says.
>
> If we leave this optimization in, we should change the comments
> slightly. Appearing to comment-out code (other than entire assertions in
> inner loops) is never a good thing, hence the change to line 130 and the
> addition of the line that follows. Also I reworded the commennt in lines
> 122-125 for clarity. Changed lines are starred, added line has a "+":
>
>  118         // Use insertion sort on tiny arrays
>  119         if (length < INSERTION_SORT_THRESHOLD) {
>  120             if (left > 0) {
>  121                 /*
> *122                  * Temporarily set the value immediately preceding
> the range
> *123                  * to the minimum int value to serve as a sentinel.
> This
> *124                  * allows us to avoid the j >= left check on each
> iteration.
>  125                  */
>  126                 int before = a[left - 1]; a[left - 1] =
> Integer.MIN_VALUE;
>  127
>  128                 for (int j, i = left + 1; i <= right; i++) {
>  129                     int ai = a[i];
> *130                     for (j = i - 1; ai < a[j]; j--) {
> +NEW                         // assert j >= left;
>  131                         a[j + 1] = a[j];
>  132                     }
>  133                     a[j + 1] = ai;
>  134                 }
>  135                 a[left - 1] = before;
>  136             } else {
>
> The comment in line 155 is misleading, and should be replace by this:
>
> I'd reword the comment in lines 137-140 for clarity:
>
>  137                 /*
>  138                  * For case, when left == 0, traditional (without a
> sentinel)
>  139                  * insertion sort, optimized for server VM, is used.
>  140                  */
>
>
> 155         // Inexpensive approximation of length / 7
>
> The comment strategy for choosing sample points is subtle enough that it
> deserves bit more commentary. I propose replacing line 158 with this:
>
>     /*
>      * Sort five evenly spaced elements around (and including) the center
>      * element in the range. These elements will be used for pivot selection
>      * as described below. The choice for spacing these elements was
> empirically
>      * determined to work well on a wide variety of inputs.
>      */
>
> Lines 183 and 184 are unnecessary array accesses. Probably better is:
>
>  183         int pivot1 = ae2;
>  184         int pivot2 = ae4;
>
> This is essentially just a renaming of these two local variables, and
> presumably the compiler will act accordingly.
>
> I prefer the original wording:
>
>  197              * Pointer k is the first index of ?-part
>
> to the revised wording:
>
>  217              * Pointer k is the first index of not inspected ?-part.
>
> I'd revert this change.
>
> I'd change 190 from:
>
>  190         if (pivot1 < pivot2) {
>
> to
>
>  190         if (pivot1 != pivot2) {
>
> It's clearer (this is the "pivots differ" case), and at least as fast.
>
> The spacing lines 249 and 250 is a bit quirky:
>
>  249             dualPivotQuicksort(a, left,   less - 2);
>  250             dualPivotQuicksort(a, great + 2, right);
>
> I'd replace it with the more standard:
>
>  249             dualPivotQuicksort(a, left, less - 2);
>  250             dualPivotQuicksort(a, great + 2, right);
>
> Alternatively, you could make corresponding arguments line up:
>
>  249             dualPivotQuicksort(a, left,      less - 2);
>  250             dualPivotQuicksort(a, great + 2, right   );
>
> The policy change in line 256 is non-trivial:
>
> Old:
>
>  298         if (less < e1 && great > e5) {
>
> New:
>
>  256             if (great - less > 5 * seventh) {
>
> I previously experimented with the "new-style" policy (length-based,
> rather than endpoint-based), but didn't get good results. Perhaps the
> symmetric style combines well with the change in interval size (5/7 vs.
> 2/3). At any rate, it's simpler than the old-style and conforms to the
> comment, so if you're satisfied with the performance, I'm happy with it.
>
> In lines 362 and 363, you have the same "quirky spacing" as in linkes
> 249 and 250:
>
>  361             // Sort left and right parts recursively
>  362             dualPivotQuicksort(a, left,   less - 1);
>  363             dualPivotQuicksort(a, great + 1, right);
>
>     Regards,
>
>     Josh

```