Implement quicksort using the STL
In this post I wrote about implementing the quicksort algorithm in C++ compared to Python. In the sence of simplicity it was neither optimized nor using any STL algorithms. This I want to change with this article.
While the biggest limitation there was that only contiguous container could be used, it was also using three loops, 2 of them nested in another. The solution for both of this problems is changing the partitioning algorithm (there is already one available at the STL, but I’ll come to it later).
The pivot in the quicksort algorithm is nothing else than a partition border: all values lower than the pivot are moved left to the pivot, all greater right to it - as result the container is partitioned into two halfes.
The standard algorithm is to iterate left and right simultaneously and swap the values according to their relationship to the pivot:
In this example A is the initial container. Step B swaps 5
(greater the the pivot) and 3
(smaller than the pivot).
In the final Step C 5
and 4
got swapped so that the pivot 4
is at the pivot point which
splits the two parts.
D shows the both partitions.
As told, one of the disadvantage of this algorithm is that the iterators of the supported containers have to be bidirectional (in the concrete example also contiguous due to using the less-than comparison on the iterators).
But there is a simpler way which requires only two loops not nested loops.
The basic principle is to start from left
and left + i
and then swap each element which is less
than the pivot. At each swap left
is increased (loop 1).
The only condition is that left
has to start at the at the first value in the container which is
not less the pivot (this is the second loop).
Here starting point is also A. In Step B the first element which is not less then pivot is searched,
this is 5
.
From this point on the forward iteration is started, left
is at 5
, right
is at left + 1
which is 1
- this happens at step C. Because left
(5
) is greater than the pivot 4
, the
swap is performed and both left
and right
are increased.
In Step D the same again, 5
(left
) is swapped with 3
(right
), both are increased.
In the following steps left
stays at its position (pointing to 3
) and only right
gets
increased. Since no value less than the pivot 4
follows, no further swaps will be made.
The result in step E is the same in terms of partitioning as in the first algorithm: the values less than the pivot are on the left, all others on the right. The only difference: the pivot element itself isn’t placed at the partition border, but that’s not problematic for this algorithm.
One notible advantage of this algorithm is that the iterators must not point to a contiguous container, nearly all types
of writable containers can be sorted this way.
At least forward iterators and less-than comparable elements in it are required.
Now lets see how it could be implemented:
void _swap(auto left, auto right) {
auto tmp = *left;
*left = *right;
*right = tmp;
}
auto _partition(auto start, auto end, auto pivot) {
// find the first element not smaller than pivot
while (start != end && *start < pivot) { ++start; }
// non is equal or greater than the pivot - return end (= start)
if (start != end) {
// now start comparing start with start + x
for(auto pos = std::next(start); pos != end; ++pos) {
if (*pos < pivot) {
// current item at pos is smaller than pivot - swap with the one at `start`
_swap(start, pos);
++start;
}
}
}
// at least return the iterator either at end (only one partition)
// or at the point where the partition with the values equal to or greater than pivot
return start;
}
This looks a lot nicer than the partitioning algorithm in this post!
Of course the quicksort function itself also gets easier:
void quicksort(auto left, auto right) {
if (!std::is_sorted(left, right)) {
// take the pivot from the leftmost element
auto pivot = *left;
// get the partitioning point
auto part_it = _partition(left, right, pivot);
if (part_it != left) {
// and call the quicksort for left only if the partitioning point isn't at leftmost position
quicksort(left, part_it);
}
else {
// the partitioning point is at leftmost position, go behind it to avoid an endless recursion
part_it = std::next(part_it);
}
// and right partition recursivly.
quicksort(std::next(part_it), right);
}
}
The first condition checks if the values in range of the given iterators are
already sorted.
This also returns true
if the range contains no elements, so we don’t have to check for left != right
.
But - this could get much more easier, the STL has a lot more algorithms which could help us.
At first lets replace the _swap
function by using std::iter_swap
.
if (*pos < pivot) {
std::iter_swap(start, pos);
++start;
}
For finding the first element not smaller than pivot we could use one of the find algorithms: std::find_if_not
.
This takes a start and end iterator, and a predicate telling which returns what to find:
// while (start != end && *start < pivot) { ++start; }
start = std::find_if_not(start, end, [&pivot](const auto& item){
return std::less(item, pivot);
});
Take a look at the partitioning algorithm so far:
auto _partition(auto start, auto end, auto pivot) {
// find the first element not smaller than pivot
start = std::find_if_not(start, end, [&pivot](const auto& item){
return std::less(item, pivot);
});
// non is equal or greater than the pivot - return end (= start)
if (start != end) {
// now start comparing start with start + x
for(auto pos = std::next(start);pos != end;++pos) {
if (!std::less(*pos*, pivot)) {
// current item at pos is smaller than pivot - swap
std::iter_swap(start, pos);
++start;
}
}
}
// at least return the iterator either at end (only one partition)
// or at the point where the partition with the values equal to or greater than pivot
return start;
}
But - there is a much more easier way: the stl itself already defines a partitioning algorithm:
std::partition
.
This does exactly the same as the algorithm above (probably much better) except that it takes a predicate to decide what value
goes to left or right partition.
So we can left out not only the _swap
but also the _partition
function.
This is what remains:
# include <algorithm>
void quicksort(auto left, auto right) {
if (left != right && !std::is_sorted(left, right)) {
// take the pivot from the mostleft element
auto pivot = *left;
// get the partitioning point
auto part_it = std::partition(left, right, [&pivot](const auto& item) { return item < pivot; });
// and call the quicksort for left and
quicksort(left, part_it);
// right partition recursivly.
quicksort(part_it, right);
}
}
In case of sorting (really) large sequences containing elements without side effects on performing the less than comparison, this
algorithm can be speed up by executing one of the both partitionings in another thread.
The complex thing in this case is to avoid starting too much threads. In the following example this is solved in a naive approach by
simple passing a max count of threads and alternating parallelizing.
This is not optimal due to the fact the we do not know how big each part is and if a thread may be started with a small or empty part.
But its only an example, there is already a better solution at the end of this post.
# include <algorithm>
# include <thread>
void quicksort(auto left, auto right, unsigned int threads = 1) {
if (!std::is_sorted(left, right)) {
auto pivot = *left;
auto part_it = std::partition(left, right, [&pivot](const auto& item) { return item < pivot; });
auto sort_left = [&left, &part_it](int _threads) { quicksort(left, part_it, _threads); };
auto sort_right = [&part_it, &right](int _threads) { quicksort(part_it, right, _threads); };
if (threads) {
// if there are threads left to start, alternate the parts and start sorting one part in a new thread
--threads;
if (threads % 2) {
std::jthread thr(sort_left, threads);
sort_right(threads);
}
else {
std::jthread thr(sort_right, threads);
sort_left(threads);
}
}
else {
// no threads left, sort both parts in the current thread
sort_left(threads);
sort_right(threads);
}
}
}
The parallel algorithm with max. 1 thread performs better at a very large number, in my case I had to use an integer vector with 20mio elements before it got faster than the sequential one.
As already written, don’t take this too serious, this can be done much better and wasn’t the goal of this post.
For example have a look on std::sort
which also allows a parallel execution since
C++ 17:
std::sort(std::execution::par, std::begin(vec), std::end(vec));