What is a non-comparison based sorting algorithm ?
A comparison based sorting algorithm that depends on comparing individual elements of the array to determine the order they should be in. However in a non-comparison based sort the algorithm does not rely on comparing individual elements rather it orders them by simply categorizing them.
What is sorting and how does Counting Sort approach the problem ?
Sorting is any process of arranging items systematically, and has two common, yet distinct meanings, One being ordering according to a sequence and the other being categorizing items with similar properties. We look into how Counting Sort achieves ordering by categorization.
In the above picture we see that the kids are separating the candies by categorizing them into different colors. This process does not require them to compare the candies with each other.
Counting sort approaches the problem of ordering in a similar way, by categorizing numbers as the keys and their frequencies as the values.
Sorting [1, 4, 1, 2, 7, 5, 2] using count sort:
Step 1: Traversing the array of size [N] to find the greatest number [k]. O(N)
Step 2: Creating a temporary array with a size k + 1 initialized with 0(s)
Step 3: While iterating over the original array add 1 to the value found at the index of temporary array that is equal to the value of the element from the original array. This basically gives us a frequency array for the values present at the corresponding indexes. [0, 2, 2, 0, 1, 1, 0, 1]. This takes one traversal over the original array. O(N)
Step 4: Convert the temporary array of frequencies to a temporary array of cumulative frequencies. [0, 2, 4, 4, 5, 6, 6, 7]. O(K)
Step 5: Create an array of size N to store the output.
Step 6: Iterate over the original array one last time, go to the index location in temporary array that is specified by the value of the element taken in the iteration. Subtract one from the value in the temporary array to obtain the index in the output array where the item selected in the iteration needs to be placed at.
Why was the temporary array sized to be K+ 1 ?
The size of the temporary array was chosen as K + 1 to simplify the addressing of the indexes in temporary array by the direct value of the elements in the original array.
A crucial question here is, why did we simply not iterate over the temporary frequency array and print the non 0 indexes to the screen, the number of times specified by the value at that index, directly as output ?
After obtaining the frequency array [0, 2, 2, 0, 1, 1, 0, 1] for the input array of [1, 4, 1, 2, 7, 5, 2] we could have simply iterated over the frequency array and printed the indexes of the non 0 values the number of times that are specified by the value at that index. Here we would print 0 X (0 times), 1 X (2 times), 2 X (2 times), 3 X (0 times), 4 X (1 times), 5 X (1 times), 6 X (0 times), 7 X (1 times)
This would result in a sorted array [1, 1, 2, 2, 4, 5, 7].
If it was that simple why did we bother to accumulate the frequencies and go through the hassle of looking up the index, subtracting 1 from the cumulative frequency, and adding the value to the output array or even create an output array ?
We did not simply iterate over the frequency array to the sorted list because, it would have resulted in the creation of a nested loop. This would be detrimental to our algorithm and increase the complexity. Our algorithm would not be linear anymore. To solve this issue and to get the result in just one iteration over the original array we need to use Cumulative frequencies.
The obvious question now is, how do cumulative frequencies help in achieving the sorted list in just one iteration over the original list ?
Cumulative frequencies is nothing but a measure of the sum of all the frequencies that have occurred before the referenced value. In simple terms it depicts how many values have occurred before the first occurrence of the referenced value, When that value is subtracted from the cumulative frequency of the referenced value it self; we again get the frequency of the referenced valued.
We have coupled the indexes with an idea of how many elements they are expecting before them that is depicted by the value at that index. By subtracting one from the Cumulative frequency for the matching index.
We basically just map the index in the output array for the last occurrence of the the referenced value to the cumulative frequency at that value. Since we are iterating over the original array we can only subtract one from the cumulative array per occurrence. As we keep subtracting we realize we can only subtract the frequency of the referenced value from the cumulative value. This means we can only go from the last index of the occurrence for the referenced value to the first index of the occurrence for the referenced value. This happens for each element in the original array and we keep saving the resultant values in an output array.
We need an output array because the values are going to be random and we might have to place certain values before the others as they are not being discovered in an orderly fashion. The numbers know where they belong no matter which one gets picked first.
The Achilles Heel of this sorting method:
This algorithm fails miserably when the value of K is vastly greater than the number of elements N in the list.
Example: Consider the array [0, 99, 100] here the elements are only 3 but the range is 0 to 100. Sorting this would require us to create an array of size 101 which is a murder of space as a resource. [K = 100] [N = 3]
The count sort performs a number of iterations over the original and temporary array. These iterations vary in size as the value of K and N are not same unless in rare cases. This leaves us with a time complexity of O(N + K).
The count sort is not an in place sorting algorithm, which means it will require extra space for storage. We create an array of size K + 1 and an output array of size N. This leaves us with a space complexity of O(N + K).
A stable algorithm maintains the original order of occurrence amongst elements with the same value. Count sort can be stable if the iteration on the original array are carried out from the last index to the first. If we start from the first index to the last then the elements just start filling up from their last possible index according to cumulative frequency first while saving space for all the elements before them. If we start from the last index, the elements coming last assume the last possible indexes according to the cumulative frequencies.
Author: Jaideep Singh Chahal
Institution Name: Bennett University
Enrollment Number: E19CSE449
Batch Number: EB03