# Replacement of Quicksort in java.util.Arrays with Dual-Pivot Quicksort: improvements

Mon Sep 14 15:07:30 UTC 2009

```Hi Team,

Thank you very much for your feedback and comments!
I collect here all hints and tips for improvement of
the new algorithm. New version of Quicksort is attached.
And now my investigations:

1. [Jon Bentley]
> Use the median of 2k+1 elements for pivots.
> We can choose a sample of five elements, sort them, and then
> partition around the 2nd and 4th sorted elements.

Yes, this helps, I can't say that we win a lot of time in general,
but it works really faster for input data like this 012012012...,
012301230..., 01234012340.... And more important thing: this
approach allows me to eliminate "div" parameter in the implementation.

As it was suggested, I take middle 5 elements with indexes: s, 2s, 3s,
4s, 5s (where s is length/6), sort them and take 2s and 4s elements
as pivots. Also I don't need to compare pivots because they are sorted.

2. [Jon Bentley]
> How about doing a test for pivot1 == pivot2, and then
> separately handling that special case? equal elements. Right now, the
> inner loop in the "// sorting" section compares every element both to
> pivot1 and pivot2.

Good catch! It reduces time of sorting equal elements (or almost equal)
from 347 to 228 and doesn't change time for other cases.

3. [Jon Bentley]
> Vladimir is currently using the first scheme that we tried.
> I wonder if it might be faster (or clearer) to try an invariant like
>         |  <  |  M   |   ?   |   M   |   >   |
> where M denotes middle elements with the property
>         Pivot1  <= M <= Pivot2
> Vladimir, do you think that you might be able to get a better invariant?

Sure, I tried other invariant like this:
|  < |  M  |  >  |  ?  |

but it is slower than current.

4.1 [Oleg Anashkin]
> First thing that came to mind - have you thought about extrapolating this
> approach to more pivots? If 2-pivot algorithm is faster than 1-pivot, then
> 3-pivot might be even faster, right? Can the number of pivots be chosen as
> a function of array size (to mitigate overhead)?

and

4.2 [Leonid Geller]
> As an observation, why not expand the new algorithm to N-Pivot
> where N = round(ln(array length)).
> This should lower the average sort cost even lower.

Here is the experimental results for several pivots.
It's easy to define recurrence relation and write code
to count the comparisons and swap:

Classic Quicksort: comps - 52254956
swaps - 26277985

Dual-Pivot Quicksort: comps - 52246553
swaps - 20977937

Quicksort with 3 pivots: comps - 54399943
swaps - 24224254

Quicksort with 4 pivots: comps - 57241615
swaps - 28659728

You can see that "3 pivots" is better than Classic Quicksort for
swaps only, "4 pivots" is the looser, and "Dual-Pivot Quicksort"
is real winner. If we have many pivots we spent a lot of time on
sorting them and dividing into more parts doesn't save. In case
with 2 pivots, we use only 1 comparison and 1/2 swaps to sort it out.

5. [R?mi Forax]
> The algorithm use two constants: 37 and 13,
> I think they should be declared as private static final.

The constants are: 27 (not 37) and 13, and I made it as
private static final int. (but 27 is changed to 17, see below)

6. [R?mi Forax]
> In the loop tagged "sorting" and "equals element", a[k] can be
> stored in a local variable to avoid to recompute it several time.

Adding a local variable doesn't save time, but it was done in the
line of other improvement (see below). Thank you for pointing to it!

I very often look at swap function and don't like it. I tried to
eliminate it and add these 3 lines into the code as inline, but
no effect. Then I looked from other side, we have a lot of code
such as:

for (int k=0; ...) {
if (a[k] < ...) {
swap(a, k, less++)
}
...
}

You can see that we retrieve a[k] many times, so I suggest doing it
like this without swap function:

for (int k=0; ...) {
int ak = a[k]

if (ak < ...) {
a[k] = a[less];
a[less++] = ak;
}
...
}

What we have achieve:

1. No swap function call
2. Two assignments instead of three
3. No local variable temp

This approach shows very nice results especially for server mode:
new: 100 / old: 126

8. [Jonathan Graehl]
> suggest this:
>  int third = len / div;
>  third=third>0?third:1;

Very elegant, but unfortunately I could not apply this,
because the schema of choosing pivot elements has been changed.

9. [Goktug Gokdogan]
> 1. Falling back-to insertion sort at < 17 instead of < 27.
> 2. (a[great] > pivot2) test is more likely to fail compared to
> (k < great) in the while loop, so exchange them (ie. use
> while (a[great] > pivot2 && k < great)).

I tried 17 as boundary for insertion sort: in some cases it is better,
in other no. But in average I don't see that time is increased.
I set up 17 in new version. Note that in my first version the value
was 17, but later it was changed to 25 -> 27.

while (a[great] > pivot2 && k < great)) - we win 3-4% ! Thank you!

------------------------------------------------------------------

I hope that I don't miss something what you suggested.
If you find missed items or have other comments/suggestions,
please, let me know. And the end I would like provide the
execution times for standard input data (all times should
be divided by 1000 to get seconds):

Client VM      Server VM

random         random
dpq: 16221     dpq: 19893
jdk: 20162     jdk: 24199

ascendant      ascendant
dpq: 5686      dpq: 7952
jdk: 9414      jdk: 11748

descendant     descendant
dpq: 5921      dpq: 7924
jdk: 9736      jdk: 11777

all equal      all equal
dpq: 261       dpq: 500
jdk: 734       jdk: 1140

repeated 50    repeated 50
dpq: 3450      dpq: 4239
jdk: 4207      jdk: 5874

organ pipes    organ pipes
dpq: 9364      dpq: 11451
jdk: 12858     jdk: 16534

random 01      random 01
dpq: 1474      dpq: 1423
jdk: 1747      jdk: 2591

010101         010101
dpq: 1092      dpq: 1052
jdk: 1140      jdk: 2076

012012         012012
dpq: 771       dpq: 1035
jdk: 1220      jdk: 1980

012301         012301
dpq: 1140      dpq: 1393
jdk: 1606      jdk: 2876

012340         012340
dpq: 1464      dpq: 1826
jdk: 1947      jdk: 3366

random 4       random 4
dpq: 1855      dpq: 2423
jdk: 2258      jdk: 3148

random 1000    random 1000
dpq: 8008      dpq: 9213
jdk: 8521      jdk: 10363

You can see that now Dual-Pivots Quicksort wins in all nominations
(in previous letter it looses in server mode in few cases).

Alan,
I'll send you updated Arrays.java little bit later.

Thank you,