Quicksort: A Python Implementation

QuicksortIn the previous articles, we introduced sorting algorithms and went through the process of coding two efficient sorting algorithms: merge sort and heapsort. In this article, we’ll take a look at one more efficient sorting algorithm: quicksort.

The quicksort algorithm was developed in 1960 by Tony Hoare, and has since gained widespread adoption. A quicksort function has been part of C since 1975, and was standardized as part of ANSI C in 1989. Python does not have a build-in quicksort function, but quicksort is easy enough to code.

Implementing the Quicksort Algorithm

Like all the other efficient sorting algorithms covered in this series of articles, the quicksort algorithm is a recursive, divide and conquer algorithm. It is based on the following steps:

  1. Pick an element (called a pivot) from the list.
  2. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it. Equal values can go either way.
  3. Apply the above steps recursively to each half of the list.

The pseudocode for this algorithm is as follows:

quicksort(L, low, high):
	if low < high
		p = partition(L, low, high)
		quicksort(L, low, p-1)
		quicksort(L, p+1, high)

The manner in which the pivot is chosen can be key. In some implementations, the leftmost element of the partition would often be chosen as the pivot. This would lead to worst case behavior in already-sorted lists. This illustrates the basic problem with quicksort: if we keep choosing the lowest element (or highest element) as the pivot, then quicksort becomes like a selection sort. One possible alternative is to choose the midpoint, which generally leads to good average case behavior. One possibility, suggested by computer scientist Robert Sedgewick, is to choose the median of the first element, the last element, and the midpoint. This guarantees a better estimate of the optimal pivot in an unsorted list. Here is my Python function for choosing a pivot:

def choosePivot(L,low=0,high=-1):
    if high == -1:
        high = len(L)-1
    mid = low+int((high-low)/2)
    # Return the median of low, middle, and high
    if L[low] < L[mid] and L[low] < L[high] and L[mid] < L[high]:
        return mid
    elif L[mid] < L[low] and L[mid] < L[high] and L[low] < L[high]:
        return low
    else:
        return high

Most of the work will be done by the partitioning function, which performs the following steps:

  1. Choose a pivot.
  2. Move the pivot to the end of the list, putting it out of the way (temporarily).
  3. Initialize two indices, i and j, to the index of the first list element.
  4. If list[i] is less than the pivot, swap it with list[j] and increment j.
  5. Increment i, and repeat step [4] as long as i is less than the index of the last list element.
  6. Swap the pivot (the last list element) with list[j]. Now the list is partitioned.

The Python representation of this function is as follows:

def partition(L,low=0,high=-1):
    if high == -1:
        high = len(L)-1
    # Select a pivot
    pivot = choosePivot(L,low,high)
    # Put the pivot at the end where it
    # will be out of the way
    swap(L,pivot,high)
    # Separate the list into elements 
    # less than and greater than pivot
    j = low
    for i in range(low,high+1):
        if L[i] < L[high]:
            swap(L,i,j)
            j += 1
    # Put the pivot between the two halves
    swap(L,high,j)
    return j

The swap function from heapsort is re-used:

def swap(L,x,y):
    temp = L[x]
    L[x] = L[y]
    L[y] = temp

Now that we have functions to choose the pivot and partition the list, writing the code for the quicksort function is relatively easy:

def quickSort(L,low=0,high=-1):
   '''Sort a list by selecting a "pivot", dividing
   the list into elements less than and greater than
   the pivot, then apply the sort recursively on the
   two list halves'''
   if high == -1:
      high = len(L)-1
   if high > low:
      # Partition into two halves, then sort
      # each half
      pivot = partition(L,low,high)
      # Check to see if pivot > 0; necessary
      # because if pivot == 0, then high = -1
      # and whole list is sorted
      if pivot > 0:
         quickSort(L,low,pivot-1)
      quickSort(L,pivot+1,high)

Recursion Errors (and a Solution)

This implementation of the function will work most of the time, but during testing, I encountered a potential issue. Assume that you have an unsorted list with a lot of duplicates – for example, the following list:

[3, 4, 2, 3, 1, 4, 2, 2, 3, 1, 3, 1, 2, 4, 1, 3, 2, 1, 3, 2]

After a few partitionings, you will likely end up with smaller sublists of all the same number. In such a case, the pivot value returned will be zero. If we allow the recursion to continue, quicksort will be called on a list of n-1 items, then n-2 items, and so on. If the lists are big enough, Python will reach the maximum allowed number of levels of recursion and the interpreter will return an error. Since we don’t want this to happen, I made a further modification to the partition function. After we are done partitioning the list, we will iterate through the list. If there are any two items that are not equal, we will stop iterating through the list and return the pivot value. If not, we will return -1, which will end any further recursive calls to quicksort. The new partition function looks like this:

def partition(L,low=0,high=-1):
    '''partition function for quicksort
    implementation'''
    if high == -1:
        high = len(L)-1
    # Select a pivot
    pivot = choosePivot(L,low,high)
    # Put the pivot at the end where it
    # will be out of the way
    swap(L,pivot,high)
    # Separate the list into elements 
    # less than and greater than pivot
    j = low
    for i in range(low,high+1):
        if L[i] < L[high]:
            swap(L,i,j)
            j += 1
    # Put the pivot between the two halves
    swap(L,high,j)
    for i in range(low,high):
        if L[i] != L[i+1]:
            return j
    return -1

In the worst-case scenario, we have to look through the entire list. Thus, instead of having a maximum of N comparisons for a list of size N, we will have 2N comparisons, but at least we avoid exponential growth in the number of operations, and this will prevent our program from crashing when there are a lot of duplicate items.

I also had to make a slight modification to the quicksort function. Here is the newly-modified function:

def quickSort(L,low=0,high=-1):
    '''quicksort: Sort a list by selecting a "pivot", 
    dividing the list into elements less than and greater 
    than the pivot, then apply the sort recursively on the 
    two list halves'''
    if high == -1:
        high = len(L)-1
    if high > low: 
        # Partition into two halves, then sort
        # each half
        pivot = partition(L,low,high)
        # Check to see if pivot > 0; necessary
        # because if pivot == 0, then high = -1
        # and whole list is sorted
        if pivot > 0:
            quickSort(L,low,pivot-1)
        if pivot > -1:
            quickSort(L,pivot+1,high)

This should work fairly well, but obviously, I may have overlooked something, so if anyone has any further suggestions for improvements to this quicksort implementation, I would love to hear them. The source code for this implementation can be downloaded here.

Built in Sort Functions: Python and C

Finally, it should be mentioned that Python lists have their own sort function. In our example, if we had a list called myList, we could make the following call to sort the list:

myList.sort()

sort takes up to three optional parameters in Python 2.7: cmp, key, and reverse. cmp specifies a function of two arguments (list items) which should return a negative, zero, or positive number depending on whether the first argument is considered smaller than, equal to, or greater than the second argument. key specifies a function of one argument that is used to extract a comparison key from each list element (e.g. str.lower). Reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed. Newer versions of Python have eliminated the first argument (cmp), which should not be a major issue, since you can still sort the list in order (or reverse order using the “reverse” parameter). The key paramter is potentially useful in a number of cases. Assume we have a list of lists, where each of the nested lists represents an employee record. We want to sort the records by employee ID, but the employee ID is the third item in each record. To sort by employee ID, all we need to do is define a key function:

def getKey(a):
    return a[2]

and call the sort function with getKey for the key parameter.

Quicksort is probably the easiest of the efficient sorting algorithms covered so far to understand and code. And in C/C++, we don’t even have to code it, as we are provided with the qsort function, which takes 4 arguments: a pointer to the array to be sorted; the number of items to be compared; the size of each item in the array, and a pointer to a function to compare two items. The comparison function should have as parameters pointers to the two items being compared, and should return -1 if the first item is less than the second, 0 if they are equal, and 1 if the first item is greater than the second (in order to sort the array in ascending order). Reverse the return values to sort the array in descending order.

External Links:

Quicksort on Wikipedia

Qsort page on cplusplus.com