Implement quicksort using the STL // dev/log

In this post I wrote about implementing the quicksort algorithm in C++ compared to Python. In the sence of simplicity it was neither optimized nor using any STL algorithms. This I want to change with this article.

While the biggest limitation there was that only contiguous container could be used, it was also using three loops, 2 of them nested in another. The solution for both of this problems is changing the partitioning algorithm (there is already one available at the STL, but I’ll come to it later).

The pivot in the quicksort algorithm is nothing else than a partition border: all values lower than the pivot are moved left to the pivot, all greater right to it - as result the container is partitioned into two halfes.

The standard algorithm is to iterate left and right simultaneously and swap the values according to their relationship to the pivot:

In this example A is the initial container. Step B swaps 5 (greater the the pivot) and 3 (smaller than the pivot). In the final Step C 5 and 4 got swapped so that the pivot 4 is at the pivot point which splits the two parts. D shows the both partitions.

As told, one of the disadvantage of this algorithm is that the iterators of the supported containers have to be bidirectional (in the concrete example also contiguous due to using the less-than comparison on the iterators).

But there is a simpler way which requires only two loops not nested loops. The basic principle is to start from left and left + i and then swap each element which is less than the pivot. At each swap left is increased (loop 1). The only condition is that left has to start at the at the first value in the container which is not less the pivot (this is the second loop).

Here starting point is also A. In Step B the first element which is not less then pivot is searched, this is 5. From this point on the forward iteration is started, left is at 5, right is at left + 1 which is 1 - this happens at step C. Because left (5) is greater than the pivot 4, the swap is performed and both left and right are increased. In Step D the same again, 5 (left) is swapped with 3 (right), both are increased. In the following steps left stays at its position (pointing to 3) and only right gets increased. Since no value less than the pivot 4 follows, no further swaps will be made.

The result in step E is the same in terms of partitioning as in the first algorithm: the values less than the pivot are on the left, all others on the right. The only difference: the pivot element itself isn’t placed at the partition border, but that’s not problematic for this algorithm.

One notible advantage of this algorithm is that the iterators must not point to a contiguous container, nearly all types of writable containers can be sorted this way.
At least forward iterators and less-than comparable elements in it are required.

Now lets see how it could be implemented:

void _swap(auto left, auto right) {
    auto tmp = *left;
    *left = *right;
    *right = tmp;
}

auto _partition(auto start, auto end, auto pivot) {
    // find the first element not smaller than pivot
    while (start != end && *start < pivot) { ++start; }
    // non is equal or greater than the pivot - return end (= start)
    if (start != end) {
        // now start comparing start with start + x
        for(auto pos = std::next(start); pos != end; ++pos) {
            if (*pos < pivot) {
                // current item at pos is smaller than pivot - swap with the one at `start`
                _swap(start, pos);
                ++start;
            }
        }
    }
    // at least return the iterator either at end (only one partition)
    // or at the point where the partition with the values equal to or greater than pivot 
    return start;
}

This looks a lot nicer than the partitioning algorithm in this post!
Of course the quicksort function itself also gets easier:

void quicksort(auto left, auto right) {
    if (!std::is_sorted(left, right)) {
        // take the pivot from the leftmost element
        auto pivot = *left; 
        // get the partitioning point
        auto part_it = _partition(left, right, pivot);
        if (part_it != left) {
          // and call the quicksort for left only if the partitioning point isn't at leftmost position
          quicksort(left, part_it);
        }
        else {
          // the partitioning point is at leftmost position, go behind it to avoid an endless recursion
          part_it = std::next(part_it);
        }
        // and right partition recursivly.
        quicksort(std::next(part_it), right);
    }
}

The first condition checks if the values in range of the given iterators are already sorted.
This also returns true if the range contains no elements, so we don’t have to check for left != right.

But - this could get much more easier, the STL has a lot more algorithms which could help us.
At first lets replace the _swap function by using std::iter_swap.

            if (*pos < pivot) {
                std::iter_swap(start, pos);
                ++start;
            }

For finding the first element not smaller than pivot we could use one of the find algorithms: std::find_if_not. This takes a start and end iterator, and a predicate telling which returns what to find:

// while (start != end && *start < pivot) { ++start; }
start = std::find_if_not(start, end, [&pivot](const auto& item){
    return std::less(item, pivot);
});

Take a look at the partitioning algorithm so far:

auto _partition(auto start, auto end, auto pivot) {
    // find the first element not smaller than pivot
    start = std::find_if_not(start, end, [&pivot](const auto& item){
        return std::less(item, pivot);
    });

    // non is equal or greater than the pivot - return end (= start)
    if (start != end) {
        // now start comparing start with start + x
        for(auto pos = std::next(start);pos != end;++pos) {
            if (!std::less(*pos*, pivot)) {
                // current item at pos is smaller than pivot - swap
                std::iter_swap(start, pos);
                ++start;
            }
        }
    }
    // at least return the iterator either at end (only one partition)
    // or at the point where the partition with the values equal to or greater than pivot 
    return start;
}

But - there is a much more easier way: the stl itself already defines a partitioning algorithm: std::partition.

This does exactly the same as the algorithm above (probably much better) except that it takes a predicate to decide what value goes to left or right partition.
So we can left out not only the _swap but also the _partition function.
This is what remains:

# include <algorithm>
void quicksort(auto left, auto right) {
  if (left != right && !std::is_sorted(left, right)) {
    // take the pivot from the mostleft element
    auto pivot = *left;
    // get the partitioning point
    auto part_it = std::partition(left, right, [&pivot](const auto& item) { return item < pivot; });
    // and call the quicksort for left and
    quicksort(left, part_it);
    // right partition recursivly.
    quicksort(part_it, right);
  }
}

In case of sorting (really) large sequences containing elements without side effects on performing the less than comparison, this algorithm can be speed up by executing one of the both partitionings in another thread.
The complex thing in this case is to avoid starting too much threads. In the following example this is solved in a naive approach by simple passing a max count of threads and alternating parallelizing.
This is not optimal due to the fact the we do not know how big each part is and if a thread may be started with a small or empty part. But its only an example, there is already a better solution at the end of this post.

# include <algorithm>
# include <thread>
void quicksort(auto left, auto right, unsigned int threads = 1) {
  if (!std::is_sorted(left, right)) {
    auto pivot = *left;
    auto part_it = std::partition(left, right, [&pivot](const auto& item) { return item < pivot; });

    auto sort_left = [&left, &part_it](int _threads) { quicksort(left, part_it, _threads); };
    auto sort_right = [&part_it, &right](int _threads) { quicksort(part_it, right, _threads); };

    if (threads) {
      // if there are threads left to start, alternate the parts and start sorting one part in a new thread
      --threads;
      if (threads % 2) {
        std::jthread thr(sort_left, threads);
        sort_right(threads);
      }
      else {
        std::jthread thr(sort_right, threads);
        sort_left(threads);
      }
    }
    else {
      // no threads left, sort both parts in the current thread
      sort_left(threads);
      sort_right(threads);
    }
  }
}

The parallel algorithm with max. 1 thread performs better at a very large number, in my case I had to use an integer vector with 20mio elements before it got faster than the sequential one.

As already written, don’t take this too serious, this can be done much better and wasn’t the goal of this post.
For example have a look on std::sort which also allows a parallel execution since C++ 17:

std::sort(std::execution::par, std::begin(vec), std::end(vec));