As you might imagine, selecting a good pivot value is crucial to the
success and performance of this sort. In the ideal case we want to
pick the statistical median key in a partition as the pivot value and,
thus, split the partition into equal halves.
The simplist way to pick a pivot value is to use the value of
first data item. This method has the advantage of being a very
fast but, unfortunately, operates on a sometimes faulty supposition.
If the data being sorted is in near-sorted order then the first item
in a given partition is very likely to have the lowest value of
all items in said partition. This leads to unbalanced partitioning
of the data set. To avoid infinite recursion usually Quicksort is
implemented to partition into a set containing the pivot element
and a set containing the n-1 others. However, repeated unbalanced
partitioning causes Quicksort to perform very poorly.
In order to eliminate this risk, often the higher of the two first
distinct key values in a partition is selected to be the pivot. Other
pivot selection schemes add the first and last values in the partition
and divide by two. Still others take the middle-of-three approach.
While methods such as these must examine slightly more data than one
which blindly chooses the first element, normally they speed up the
overall algorithm by choosing a value more likely to be near the
median of the partition.
Below is an example pivot selecting routine written in C. It uses the
greater of the first two distinct values in a partition as the pivot
point and returns NONE if the partition does not need to be sorted any
further.
/* NONE must be a value that cannot occur as a key */
#define NONE -1
/* these are arbitrary */
typedef key int;
typedef data struct {
key thekey;
char *therest;
};
/*
* Return the pivot value or NONE if the partition does
* not need to be sorted any further.
*
*/
key selectpivot(data *array, int left, int right) {
key first = value(array[left]); /* the first key */
int lcv; /* loop control */
for (lcv = left + 1; lcv <= right; lcv++) {
if (value(array[lcv]) > first) return (value(lcv));
else if (value(array[lcv]) < first) return first;
}
/* if we get here the partition is homogeneous */
return (NONE);
}
key value(data *item) {
return (item->thekey);
}
Above, the routine selects the value of a pivot point and returns it,
or, if all the elements in the partition are equal in value, returns
NONE to indicate that further sorting of this partition is not
necessary. NONE must be a value that will not appear in the
array. If you cannot predict which values will appear in your data
set, it would be better to write the above routine to return the index
of the pivot value rather than the value itself. Then it could return
an invalid index number to specify a homogeneous partition.
|