New portion of improvements for Dual-Pivot Quicksort
Dmytro Sheyko
dmytro_sheyko at hotmail.com
Thu May 6 08:43:52 UTC 2010
Vladimir,
> If we consider case when the whole array is sorted,
> we can omit setting sentinel and restoring original value:
Let me define my point more precisely. We can omit setting sentinel not only in a particular case when the whole array is sorted.
During partitioning the whole array is split into small chunks. Elements within a chunks remains unsorted,
but every element in a certain chunk is greater than every element in preceding (lefter) chunks
and less than every element in following (righter) chunks.
E.g.
[...A...B...]C[...D...E...]F[...G...H...]
Elements C and F are pivots.
Elements D and E are greater that pivot C and hence greater than A and B, and less than pivot F and hence G and H.
Therefore during sorting any non-leftmost chunk we can rely on preceding chunk.
Did I miss something? Is this what you mean?
Thank you,
Dmytro Sheyko
> Date: Wed, 5 May 2010 21:23:37 +0400
> From: iaroslavski at mail.ru
> Subject: Re: New portion of improvements for Dual-Pivot Quicksort
> To: dmytro_sheyko at hotmail.com
> CC: core-libs-dev at openjdk.java.net
>
> Hi Dmytro,
>
> Thank you very much for suggestions! I checked it and here
> is my results:
>
> If we consider case when the whole array is sorted,
> we can omit setting sentinel and restoring original value:
>
> // int before = a[left - 1]; a[left - 1] = Integer.MIN_VALUE;
>
> for (int j, i = left + 1; i <= right; i++) {
> int ai = a[i];
> for (j = i - 1; /* j >= left && */ ai < a[j]; j--) {
> a[j + 1] = a[j];
> }
> a[j + 1] = ai;
> }
> // a[left - 1] = before;
>
> I checked this version and found that it works little bit faster
> (only 0.4-0.3%) with client VM, but loses more than 2.5% under
> server mode in compare with one-pivot version from JDK 6:
>
> client server
> original: 60.75% 48.20%
> no setting sentinel: 60.40% 50.88%
>
> If we add additional check which case is sorted, I think, we
> don't win at all.
>
> And if pivot candidates are not put back to array, it shows
> very poor results:
>
> client server
> original: 60.75% 48.20%
> no back candidates: 66.80% 53.95%
>
> I run one pivot version taken from JDK 6: I replace in Dual-Pivot
> implementation with all optimizations dual-pivot partitioning by one-pivot:
> but it remain still slower than DPQ. So, the main point of DPQ is not only
> in optimizations, but in dividing array into 3 parts instead of two.
>
> Thank you,
> Vladimir
>
> Dmytro Sheyko wrote:
> > Hi Vladimir,
> >
> > The trick with sentinel is quite cute. Going farther, I think it is not
> > always necessary to replace left outer element with the least possible value
> > (and restore it later after sorting) because after partitioning every
> > element in adjoining array part can play the role of sentinel.
> > The only exception is when the user requested to sort array partially
> > (not from the beginning). Thereby we should care about setting sentinel
> > explicitly in this exceptional case only and only once before sorting whole array.
> >
> > Also it seems to me that it is not necessary to put pivot candidates (5
> > elements that are used to choose pivots) back to array
> > because anyway they are to be sorted later and likely change their
> > positions.
> >
> > I am also interesting in theoretical rationale why dual pivot quicksort
> > is better than single pivot one. The last document that I have seen
> > (Last updated: September 22, 2009)
> > compared classic quicksort (where pivot is chosen arbitrarily) with
> > "classic" dual pivot quicksort (where pivots are also chosen arbitrarily).
> > As conclusion they both perform about 2*n*ln(n) key comparisons. However
> > jdk had improved quicksort: median-of-three and pseudomedian-of-nine
> > approaches were used. And median-of-three approach lead to 12/7*n*ln(n)
> > key comparisons. On the other hand, dual pivot quicksort is also improved:
> > pivots are chosen from 5 candidates, and hence it must perform less than
> > 2*n*ln(n) key comparisons.
> >
> > Regards,
> > Dmytro Sheyko
> >
> > > Date: Tue, 27 Apr 2010 01:50:08 +0400
> > > Subject: New portion of improvements for Dual-Pivot Quicksort
> > > From: vladimir.iaroslavski at googlemail.com
> > > To: core-libs-dev at openjdk.java.net
> > >
> > > Hello, everyone!
> > >
> > > I've investigated the implementation of the Dual-Pivot Quicksort
> > > which is used for sorting primitives and here is my result:
> > >
> > > http://cr.openjdk.java.net/~alanb/6947216/webrev.00
> > >
> > > New implementation of Dual-Pivot Quicksort is faster
> > > than previous one of 12% for client VM and few percents for
> > > server VM. Tests on Bentley's test suite (Win XP, JDK 7,
> > > build 1.7.0-ea-b84, n = 1000000) show geometry mean 0.88
> > > for client VM and 0.98-1.00 for server VM.
> > >
> > > In compare with sorting from JDK 6 by Jon L. Bentley and
> > > M. Douglas McIlroy's (with one pivot) the ratio is 0.61 / 0.50
> > > (client / server).
> > >
> > > See the execution time for sorting array of 2`000`000 int elements
> > > 50 times, client / server VM, in milliseconds:
> > >
> > > random
> > > new: 16723 18776
> > > jdk7: 17571 18975
> > > jdk6: 22241 26524
> > >
> > > ascendant
> > > new: 3541 4702
> > > jdk7: 4486 4669
> > > jdk6: 8667 7978
> > >
> > > descendant
> > > new: 3854 4907
> > > jdk7: 4942 5034
> > > jdk6: 8787 8243
> > >
> > > equal
> > > new: 234 281
> > > jdk7: 291 230
> > > jdk6: 602 1018
> > >
> > > organ pipes
> > > new: 7673 8613
> > > jdk7: 8167 8993
> > > jdk6: 11902 14044
> > >
> > > stagger 1
> > > new: 7648 8591
> > > jdk7: 8161 8979
> > > jdk6: 11908 13810
> > >
> > > stagger 2
> > > new: 8349 9299
> > > jdk7: 10968 11916
> > > jdk6: 12194 14734
> > >
> > > stagger 4
> > > new: 8475 9622
> > > jdk7: 9221 9682
> > > jdk6: 10523 12006
> > >
> > > stagger 8
> > > new: 9321 10689
> > > jdk7: 11125 12387
> > > jdk6: 13829 16214
> > >
> > > period 1..2
> > > new: 758 751
> > > jdk7: 870 754
> > > jdk6: 1038 1227
> > >
> > > period 1..4
> > > new: 1004 963
> > > jdk7: 1365 1209
> > > jdk6: 1511 1847
> > >
> > > period 1..8
> > > new: 1588 1573
> > > jdk7: 1599 1790
> > > jdk6: 2602 3045
> > >
> > > random 1..2
> > > new: 1327 1125
> > > jdk7: 1362 1496
> > > jdk6: 1531 2182
> > >
> > > random 1..4
> > > new: 1830 2118
> > > jdk7: 1851 2236
> > > jdk6: 2292 3025
> > >
> > > where stagger(m) is array like a[i] = i * (m + 1) % length.
> > >
> > > The summary of changes is:
> > >
> > > 1. For sorting small arrays is used insertion sort with sentinel
> > > instead of traditional, which has the structure:
> > >
> > > for (int i = left + 1; i <= right; i++) {
> > > for (j = i; j > left && a[j-1] > a[j]; j--) {
> > > swap(a[i], a[j-1]);
> > > }
> > > }
> > >
> > > Note that range check j > left is performed on each iteration,
> > > but really helps very rare. To avoid this expensive range check,
> > > it was suggested to set minimum value (negative infinity) on the
> > > first position. This type of suggestion is used in new version:
> > >
> > > if left bound > 0, we can put sentinel on a[left - 1], do insertion
> > > sort without expensive check range, and then restore a[left - 1]
> > > value. If left == 0, traditional insertion sort is used. Please,
> > > look at the webrev for details.
> > >
> > > 2. In previous implementation 5 evenly spaced elements
> > >
> > > sixth = length / 6;
> > > a[sixth], a[2 * sixth], a[3 * sixth], a[4 * sixth], a[5 * sixth]
> > >
> > > were used as candidates of pivots elements. This case is very
> > > sensitive for period inputs, especially by 6. The new suggestion
> > > is to take 5 center evenly spaced elements like this:
> > >
> > > int seventh = length / 7;
> > > int midpoint = (left + right) >>> 1;
> > >
> > > a[midpoint - 2 * seventh], a[midpoint - seventh], a[midpoint],
> > > a[midpoint + seventh], a[midpoint + 2 * seventh]
> > >
> > > and moreover, the seventh is calculated inexpensively:
> > > seventh = (length >>> 3) + (length >>> 6) + 1;
> > >
> > > This schema works the same on random, ascendant, descendant, equal
> > > inputs, but much better for period / stagger.
> > >
> > > 3. The whole structure
> > >
> > > <choose pivots>
> > >
> > > if (pivotsDiffer) {
> > > <do partitioning for two pivots>
> > > }
> > > else {
> > > <do partitioning for one pivot>
> > > }
> > >
> > > <sort left and right parts>
> > >
> > > if (!pivotsDiffer) {
> > > return;
> > > }
> > > <swap internal pivot values to ends>
> > >
> > > <sort center part>
> > >
> > > was modified to:
> > > ----------------
> > >
> > > <choose pivots>
> > >
> > > if (pivot1 < pivot2) {
> > > <do partitioning for two pivots>
> > > <swap pivots into their final positions>
> > > <sort left and right parts>
> > > <swap internal pivot values to ends>
> > > <sort center part>
> > > }
> > > else {
> > > <do partitioning for one pivot>
> > > <sort left and right parts>
> > > }
> > >
> > > 4. Partitioning for both cases have not been changed at all.
> > >
> > > 5. Minor javadoc and format changes.
> > >
> > > Please, review new implementation,
> > > any comments / suggestions are welcome!
> > >
> > > Thank you,
> > > Vladimir
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
https://signup.live.com/signup.aspx?id=60969
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100506/d4d623a2/attachment.html>
More information about the core-libs-dev
mailing list