From EPI:
Given two sorted arrays, find the kth smallest element in the array that would be produced if we combined and sorted them. Array elements may be duplicated within and across the input arrays.
Input: A = [-7, -5, 0, 2, 3, 3, 5, 7] B = [1, 2, 3] k = 3 Output: 0 # 0 is smaller than all except for -7 and -5, # making it the 3rd smallest element.
This is a more abstract version of the Median of Two Sorted Arrays problem. Suppose that the arrays we’ve been given have a combined length of 5. Finding the 3rd smallest element in both arrays would be the same as finding their median.
The obvious, straightforward way to solve this problem would be to run a sorting algorithm on both arrays:
def find_kth_smallest(A, B, k):
C = sorted(A + B)
return C[k-1]
However, assuming that the arrays are of size n
and m
, that gives us a theoretical complexity of O(nm log nm)
. We can definitely do better.
Viewing Two Arrays as Four Partitions
Even though we want to avoid sorting both arrays, we also want be able to say something about the relative ordering of elements across arrays.
We can reframe our understanding of the two arrays by partitioning them into two halves each:
A = [2, 4, 6, 8] # [2, 4] + [6, 8]
B = [1, 3, 5, 7] # [1, 3] + [5, 7]
The two left halves from both arrays are the same size as the two right halves, but we still don’t know anything about their relative ordering.
How can we know if the elements on the left are less than (or equal to) the elements on the right? We can confirm this by checking if the rightmost element of each array’s left half is less than or equal to the leftmost element of the other arrays’ right half:
A = [2, 4, 6, 8] # [2, 4] + [6, 8]
# AL AR
B = [1, 3, 5, 7] # [1, 3] + [5, 7]
# BL BR
# AL <= BR and BL <= AR
# AL: Rightmost element of A's left half
# AR: Leftmost element of A's right half
# BL: Rightmost element of B's left half
# BR: Leftmost element of B's right half
Suppose we’re interested in the 4th smallest element of these two arrays, A
and B
. We know that:
- The combined size of the left two halves is 4
- The elements on the left are less than or equal to the elements on the right
Then, it must be the case that the 4th smallest element is on the left! Specifically, it’s the maximum of the rightmost elements in the left halves. In this example, the 4th smallest element would be the larger of A[1]
and B[1]
, which is 4
.
Calibrating Partition Sizes
Several things worked well in our favour. We were looking at the 4th smallest element amongst a total of 8 elements, and combining the two left halves gave us the 4 smallest elements.
We’ve yet to address other possibilities. What if our halves didn’t cohere nicely with each other? It could be that the rightmost element of B
’s left half is larger than leftmost element of A
’s right half:
A = [1, 2, 3, 4] # [1, 2] + [3, 4]
# AL AR
B = [5, 6, 7, 8] # [5, 6] + [7, 8]
# BL BR
# AL: Rightmost element of A's left half
# AR: Leftmost element of A's right half
# BL: Rightmost element of B's left half
# BR: Leftmost element of B's right half
We can’t always assume that the elements on the left will be less than or equal to the elements on the right. What we need is a way to move beyond working with strictly-defined halves. We want the ability to calibrate the lengths of our subarrays in order to uphold AL <= BR
and BL <= AR
.
To support this, an invariant is required: The combined size of the left partitions should always remain the same (i.e. 4, in this example). This is because:
- We’re specifically concerned with the 4th smallest element.
- We identify the 4th smallest element from the rightmost elements in the left partitions. If we expand or shrink the total size of our left partitions, the rightmost elements will no longer capture the 4th smallest element.
At this moment, BL > AR
. To achieve BL <= AR
, we’ll form our left partitions using fewer elements from B
and more from A
. We can determine how much to add from A
by adopting a binary search type approach.
Since we currently have 2 elements from A
and the total size of A
is 4, we can gradually increment A
’s left partition from 2 to 3, and then from 3 to 4:
# Using 3 elements from A
A = [1, 2, 3, 4] # [1, 2, 3] + [4]
# AL AR
B = [5, 6, 7, 8] # [5] + [6, 7, 8]
# BL BR
# Using 4 elements from A
A = [1, 2, 3, 4] # [1, 2, 3, 4] + []
# AL AR
B = [5, 6, 7, 8] # [] + [5, 6, 7, 8]
# BL BR
With 4 elements from A, we’ve attained the two conditions necessary for a cross-array ordering: AL <= BR
and BL <= AR
. Since A
’s right partition is empty, AR
can be represented by an inf
value. Conversely, BL
can be represented by -inf
.
The 4th smallest element would be, as mentioned, the maximum value of the rightmost elements in the left partitions –– i.e. A[3] = 4
.
What if we were considering other values of k
? Instead of always storing half the total number of elements in our left partitions, we’d just split the arrays such that the left side contains k
elements.
From Intuition To Code
Now that we’ve gotten an idea for how to tackle this problem, let’s write some code.
I mentioned using a binary search type approach for determining the number of elements contained in A
’s left partition. Let’s apply a lower and upper bound on this value:
def find_kth_smallest(A, B, k):
# If *all* elements in B are in B's left ptn,
# we'll need at least k - len(B) elements
# in A's left ptn
lo = max(0, k - len(B))
# If k is less than the size of A, then we
# only need at most k elements in A's left ptn
hi = min(len(A), k)
AL
, AR
, BL
and BR
can fall out beyond the index range of their respective arrays. For instance, selecting all the elements from A
means that AR
would lie at A[len(A)]
and BL
would lie at B[-1]
:
A = [1, 2, 3, 4] # [1, 2, 3, 4] + []
# AL AR
B = [5, 6, 7, 8] # [] + [5, 6, 7, 8]
# BL BR
To make our lives easier when we reference these out-of-bound indices, we’ll define a helper function:
def get_val(arr, i):
if 0 <= i <= len(arr) - 1:
return arr[i]
return math.inf * (-1 if i < 0 else 1)
Now let’s proceed to the core algorithm. Our binary search helps us calibrate the size of A’s left partition which, by extension, affects the size of B’s left partition (since A_size
+ B_size
must be k
):
while lo <= hi:
# A_size: Size of left ptn in A
A_size = (lo + hi) // 2
B_size = k - A_size
A_left, A_right = get_val(A, A_size-1), get_val(A, A_size)
B_left, B_right = get_val(B, B_size-1), get_val(B, B_size)
if A_left <= B_right and B_left <= A_right:
# kth smallest is derived from the maximum of
# the rightmost elements in the left ptns
return max(A_left, B_left)
elif A_left > B_right:
lo, hi = lo, A_size - 1
else: # B_left > A_right
lo, hi = A_size + 1, hi
A_left
, A_right
, B_left
, and B_right
simply refer to AL
, AR
, BL
, and BR
respectively. Once we find that AL <= BR
and BL <= AR
, we can easily derive the kth smallest element.
Given n
number of elements in array A
, our algorithm runs in O(1)
space and O(log n)
time.
With that, we’ve solved the Finding the Kth Smallest Element in Two Sorted Arrays problem 🎢🎢.
Full Solution
def find_kth_smallest(A, B, k):
lo = max(0, k - len(B))
hi = min(len(A), k)
def get_val(arr, i):
if 0 <= i <= len(arr) - 1:
return arr[i]
return math.inf * (-1 if i < 0 else 1)
while lo <= hi:
A_size = (lo + hi) // 2
B_size = k - A_size
A_left, A_right = get_val(A, A_size-1), get_val(A, A_size)
B_left, B_right = get_val(B, B_size-1), get_val(B, B_size)
if A_left <= B_right and B_left <= A_right:
return max(A_left, B_left)
elif A_left > B_right:
lo, hi = lo, A_size - 1
else: # B_left > A_right
lo, hi = A_size + 1, hi