Today's Question:  What are you most afraid of as a programmer?

# Find the kth smallest number in an array

sonic0002      2013-01-09 06:20:54      4,266    0    0

This is an classical question, the general solution to this question is first using sorting algorithm to sort the array, then get the kth number in the array, the complexity is O(nlogn), we can also use selective sort or head sort to, the complexity of selective sort is O(kn) and heap sort is O(nlogk). The better solution is to use quick sort to find the kth smallest number, the complexity is O(n), the worst case cost is O(n^2). But today we introduce one more solution which has the worst case cost O(n).

First let's check the quick sort solution, quick sort is to find one number, then put all numbers in the array which is less than the number chosen on one side and then put the other numbers on the other side. Then recursively  sort them again. Then to find the kth smallest number in the array, we only need to find it in one side of the array.

1. import random
2.
3. def partition(arr, left, right, pivot):
4.     v = arr[pivot]
5.     arr[pivot], arr[right-1] = arr[right-1], arr[pivot]
6.     index = left
7.     for i in xrange(left, right):
8.         if arr[i] <= v:
9.             arr[i], arr[index] = arr[index], arr[i]
10.             index += 1
11.     return index-1
12.
13. def select(arr, left, right, k):
14.     while right - left > 1:
15.         index = partition(arr, left, right, random.randint(left, right-1))
16.         dist = index - left + 1
17.         if dist == k:
18.             return arr[index]
19.         if dist < k:
20.             k -= dist
21.             left = index + 1
22.         else:
23.             right = index
24.     return arr[left]

Here arr is the array to search, we can call select to find the kth smallest number, if we cannot find the pivot element correctly, the worst case cost will be O(n^2).

Now we discuss the solution which has worst case cost O(n), we can divide all these numbers into some subarrays which have 5 elements in each subarray, then there will be n/5 subarrays. For each subarray, we can find the middle number very fast, then we can find the middle number of these n/5 middle numbers. This algorithm is called Median of Medians algorithm.

If we use the median of the medians as the pivot, then there will be 3/5*1/2 which is 3/10 numbers which are less or equal to pivot, also there will be 3/10 numbers which are greater than pivot, so the worst case cost is the array will be divided into 30%,70% or 70%,30% two parts.

The cost is :

T(n)<=T(n/5)+T(7/10*n)+O(n)<=c*n*(1+9/10+(9/10)^2....)
So T(n)=O(n)

The worst case cost is O(n).

1. import heapq
2.
3. def partition(arr, left, right, pivot):
4.     v = arr[pivot]
5.     arr[pivot], arr[right-1] = arr[right-1], arr[pivot]
6.     index = left
7.     for i in xrange(left, right):
8.         if arr[i] <= v:
9.             arr[i], arr[index] = arr[index], arr[i]
10.             index += 1
11.     return index-1
12.
13. def select_heap(arr, left, right, k):
14.     tmp = arr[left:right]
15.     heapq.heapify(tmp)
16.     [heapq.heappop(tmp) for i in xrange(k-1)]
17.     return heapq.heappop(tmp)
18.
19. def median(arr, left, right):
20.     num = (right - left - 1) / 5
21.     for i in xrange(num+1):
22.         sub_left = left + i*5
23.         sub_right = sub_left + 5
24.         if sub_right > right:
25.             sub_right = right
26.         m_index = select_heap(arr, sub_left, sub_right, (sub_right-sub_left)/2)
27.         arr[left+i], arr[m_index] = arr[m_index], arr[left+i]
28.     return select(arr, left, left+num+1, (num+1)/2)
29.
30. def select(arr, left, right, k):
31.     while right - left > 1:
32.         pivot = median(arr, left, right)
33.         index = partition(arr, left, right, pivot)
34.         dist = index - left + 1
35.         if dist == k:
36.             return arr[index]
37.         if dist < k:
38.             k -= dist
39.             left = index + 1
40.         else:
41.             right = index
42.     return arr[left]

By sonic0002