13  Non-comparison sorting algorithms

14 Key concepts

14.1 The limits of comparison sorts

While comparison-based sorting algorithms are widely used and have many advantages, there are some limits to their performance and scalability. Here are some of the limitations of comparison-based sorting algorithms:

  1. Lower Bound: The worst-case time complexity of comparison-based sorting algorithms is O(nlogn), which is the lower bound for any comparison-based sorting algorithm. This means that there is a limit to how fast a comparison-based algorithm can sort a list of elements.

  2. Performance on Certain Data: Some comparison-based sorting algorithms, such as Quicksort and Mergesort, are particularly well-suited for sorting randomly ordered input data. However, they may perform poorly on already sorted or nearly sorted data, resulting in unnecessary comparisons and swaps.

  3. Comparison Overhead: Comparison-based sorting algorithms require comparisons between elements to determine their relative order. In some cases, the cost of these comparisons may be higher than other operations, such as swaps or moves. This can impact the performance of the algorithm.

  4. Non-Adaptive: Comparison-based sorting algorithms are not adaptive, meaning that they do not take advantage of any existing order in the input data. This can result in unnecessary comparisons and swaps when sorting partially sorted data.

Non-comparison-based sorting algorithms, on the other hand, can overcome some of these limitations. For example, Counting Sort, Radix Sort, and Bucket Sort all have linear time complexity, making them faster than comparison-based sorting algorithms for certain types of input data. These algorithms also do not require comparisons between individual elements, making them more efficient in some cases. However, non-comparison-based sorting algorithms also have their own limitations and may not be suitable for all types of input data.

The choice of sorting algorithm depends on the specific requirements of the application and the properties of the input data. While comparison-based sorting algorithms are widely used and have many advantages, there are limits to their performance and scalability, which may make non-comparison-based sorting algorithms a better choice in certain situations.

14.2 Counting, radix and bucket sort

Counting Sort, Radix Sort, and Bucket Sort are all examples of non-comparison-based sorting algorithms.

14.2.1 Counting Sort

Counting Sort is a linear time sorting algorithm that is used to sort a list of integers in a specific range. The algorithm works by counting the number of occurrences of each integer value in the input list, and using this count to determine the sorted order of the values. Counting Sort assumes that the input data consists of integers in a known range, making it a useful algorithm for certain types of input data. Counting Sort has a time complexity of O(n+k), where n is the number of elements in the input list and k is the range of the input values.

Here is some pseudocode for Counting Sort:

countingSort(array A, int k)
    let C[0..k] be a new array
    let B[0..n-1] be a new array
    for i = 0 to k
        C[i] = 0
    for j = 0 to n-1
        C[A[j]] = C[A[j]] + 1
    for i = 1 to k
        C[i] = C[i] + C[i-1]
    for j = n-1 down to 0
        B[C[A[j]]-1] = A[j]
        C[A[j]] = C[A[j]] - 1
    return B

The countingSort function takes as input an array A of integers and an integer k, which represents the maximum value in A. The function creates two new arrays C and B, with sizes k+1 and n respectively, where n is the length of A.

The algorithm then iterates through A to count the number of occurrences of each value in C. It then updates C to hold the prefix sum of the counts. Finally, it iterates through A in reverse order and uses C to place each element into its correct sorted position in B.

The sorted array B is then returned as the output of the function.

The time complexity of Counting Sort is O(n+k), where n is the length of the input array and k is the maximum value in the input array.

14.2.2 Radix Sort

Radix Sort is a linear time sorting algorithm that is used to sort a list of integers by their digits. The algorithm works by sorting the input list based on the values of their digits, starting from the least significant digit to the most significant digit. Radix Sort can be implemented using either a least significant digit (LSD) or most significant digit (MSD) approach. Radix Sort has a time complexity of O(nk), where n is the number of elements in the input list and k is the maximum number of digits in the input values.

Here is the pseudocode for Radix Sort (LSD):

radixSortLSD(array A)
    let B[0..n-1] be a new array
    let maxDigit be the maximum number of digits in the numbers in A
    for d = 0 to maxDigit-1
        let C[0..9] be a new array
        for i = 0 to 9
            C[i] = 0
        for j = 0 to n-1
            digit = (A[j] / (10^d)) mod 10
            C[digit] = C[digit] + 1
        for i = 1 to 9
            C[i] = C[i] + C[i-1]
        for j = n-1 down to 0
            digit = (A[j] / (10^d)) mod 10
            B[C[digit]-1] = A[j]
            C[digit] = C[digit] - 1
        A = B
    return A

The radixSortLSD function takes as input an array A of integers. The function first determines the maximum number of digits in the numbers in A. It then iterates through each digit position from the least significant digit to the most significant digit.

For each digit position, the algorithm creates a new array C to hold the counts of each digit value (0-9) in A. It then iterates through A to count the number of occurrences of each digit value, updates C to hold the prefix sum of the counts, and uses C to place each element into its correct sorted position in a temporary array B.

Finally, the sorted array B is assigned back to A and the algorithm moves on to the next digit position.

The time complexity of Radix Sort (LSD) is O(d(n+k)), where n is the length of the input array, k is the maximum value in the input array, and d is the maximum number of digits in the numbers in A.

14.2.3 Bucket Sort

Bucket Sort is a linear time sorting algorithm that is used to sort a list of elements by dividing them into buckets and then sorting the individual buckets. The algorithm works by dividing the input values into a fixed number of equally sized buckets, and then sorting each bucket using another sorting algorithm, such as Insertion Sort or Quicksort. Bucket Sort assumes that the input data is uniformly distributed across a known range, making it a useful algorithm for certain types of input data. Bucket Sort has a time complexity of O(n+k), where n is the number of elements in the input list and k is the number of buckets.

These non-comparison-based sorting algorithms offer advantages over comparison-based sorting algorithms in certain situations. For example, Counting Sort and Radix Sort have linear time complexity, which can be faster than the O(nlogn) time complexity of comparison-based sorting algorithms. However, these algorithms also have specific requirements and limitations, such as the need for input data in a known range or the need for uniformly distributed input data. It is important to choose the appropriate sorting algorithm based on the specific requirements of the application and the properties of the input data.

Here is the pseudocode for Bucket Sort:

bucketSort(array A)
    let B[0..n-1] be a new array
    let buckets[0..n-1] be a new array of empty lists
    for i = 0 to n-1
        bucketIndex = floor(n*A[i])
        append A[i] to buckets[bucketIndex]
    for i = 0 to n-1
        sort buckets[i] using any sorting algorithm
        concatenate buckets[i] onto B
    return B

The bucketSort function takes as input an array A of floating-point numbers between 0 and 1. The function creates a new array B of the same length as A, as well as an array of n empty lists called buckets. The n value here refers to the number of buckets, which is usually chosen to be the same as the length of A.

The algorithm then iterates through A, calculates the index of the corresponding bucket for each element using the formula floor(n*A[i]), and appends each element to the appropriate bucket.

Next, the algorithm sorts each non-empty bucket individually using any sorting algorithm, and concatenates the sorted buckets together into B.

Finally, the sorted array B is returned as the output of the function.

The time complexity of Bucket Sort depends on the sorting algorithm used to sort the individual buckets, but it can achieve linear time complexity in the average case if the elements are uniformly distributed among the buckets. In the worst case, where all elements fall into the same bucket, the time complexity is the same as the sorting algorithm used for sorting the buckets.

15 Learning outcomes

15.1 Identify the different approaches of different non-comparison sorting algorithms

There are different non-comparison sorting algorithms, each with its own approach. Here are the main approaches of some popular non-comparison sorting algorithms:

  1. Counting Sort: This algorithm works by counting the number of occurrences of each element in the input array, and then using this count to determine the position of each element in the output array. Counting Sort is efficient when the range of values in the input array is small.

  2. Radix Sort: This algorithm sorts the input elements by their digits, from the least significant to the most significant. Radix Sort can be implemented using LSD (Least Significant Digit) or MSD (Most Significant Digit) approach, depending on the sorting order.

  3. Bucket Sort: This algorithm divides the input elements into a number of buckets and then sorts the elements within each bucket. Bucket Sort is efficient when the input elements are uniformly distributed over a range.

  4. Pigeonhole Sort: This algorithm is a variation of Bucket Sort, where each bucket represents a pigeonhole, and the input elements are distributed among these pigeonholes according to their values. Pigeonhole Sort is efficient when the range of values in the input array is small and the values are integers.

Each non-comparison sorting algorithm has its own strengths and weaknesses, and the choice of algorithm depends on the specific characteristics of the input data.

15.2 Implement different non-comparison sorting algorithms

Here are implementations of some popular non-comparison sorting algorithms in Python:

15.2.1 Counting Sort

def counting_sort(arr):
    # Find the maximum value in the array
    k = max(arr)
    
    # Initialize an array to hold the counts of each element
    counts = [0] * (k+1)
    
    # Count the number of occurrences of each element in the array
    for elem in arr:
        counts[elem] += 1
    
    # Calculate the cumulative counts of the elements
    for i in range(1, k+1):
        counts[i] += counts[i-1]
    
    # Initialize the output array
    output = [0] * len(arr)
    
    # Place each element in its sorted position
    for elem in arr:
        index = counts[elem] - 1
        output[index] = elem
        counts[elem] -= 1
    
    return output

15.2.2 Radix Sort (LSD)

def radix_sort_lsd(arr):
    # Find the maximum number of digits in the array
    max_digit = len(str(max(arr)))
    
    # Sort the array by each digit, from LSD to MSD
    for digit in range(max_digit):
        # Initialize the bucket array
        buckets = [[] for _ in range(10)]
        
        # Distribute the elements into the buckets based on the current digit
        for elem in arr:
            bucket_index = (elem // (10**digit)) % 10
            buckets[bucket_index].append(elem)
        
        # Concatenate the elements in the buckets to form the sorted array
        arr = [elem for bucket in buckets for elem in bucket]
    
    return arr

15.2.3 Bucket Sort

def bucket_sort(arr):
    # Determine the number of buckets
    num_buckets = len(arr)
    
    # Initialize the buckets
    buckets = [[] for _ in range(num_buckets)]
    
    # Distribute the elements into the buckets
    for elem in arr:
        bucket_index = int(elem * num_buckets)
        buckets[bucket_index].append(elem)
    
    # Sort the elements within each bucket
    for i in range(num_buckets):
        buckets[i].sort()
    
    # Concatenate the elements in the buckets to form the sorted array
    sorted_arr = [elem for bucket in buckets for elem in bucket]
    
    return sorted_arr

These implementations assume that the input is a list of integers or floating-point numbers. The time complexity of these non-comparison sorting algorithms depends on the input size and the range of values in the input. Counting Sort and Bucket Sort have linear time complexity in the average case, while Radix Sort has linear time complexity in the worst case.

15.3 Calculate the time complexity of different non-comparison sorting algorithms

Here are the time complexities of some popular non-comparison sorting algorithms:

15.3.1 Counting Sort

Counting Sort has a time complexity of O(n+k), where n is the number of elements in the input array, and k is the range of values in the input array. Counting Sort is efficient when the range of values is small, such as when sorting integers or characters.

15.3.2 Radix Sort

Radix Sort has a time complexity of O(d(n+k)), where n is the number of elements in the input array, k is the range of values in the input array, and d is the number of digits in the maximum value in the input array. Radix Sort is efficient when the input values have a fixed number of digits, such as when sorting integers.

15.3.3 Bucket Sort

Bucket Sort has a time complexity of O(n+k), where n is the number of elements in the input array, and k is the number of buckets used for sorting. Bucket Sort is efficient when the input elements are uniformly distributed over a range.

These non-comparison sorting algorithms have better time complexity than comparison-based sorting algorithms, which have a lower bound of O(n log n) in the average case. However, the performance of non-comparison sorting algorithms depends on the specific characteristics of the input data, and they may not be efficient in all cases.