Subsequence / subarray problems can come in a variety of forms. You might to tasked to look for:
- The longest common subsequence (LCS) between two or three strings
- The longest palindromic substring in a string
- The largest sum subarray in an array of numbers
- The largest sum subarray in a circular array of numbers
- The existence of a
k
-sum subarray
And the list goes on. Today, we’ll pick a relatively interesting problem from this domain:
From EPI:
Find the longest non-contiguous, non-decreasing subsequence in an array of numbers.
Input: [4, 0, 5, 5, 7, 6, 7] Output: 5 # Example: 4 5 5 7 7
Generally speaking, you can assume that subsequences consist of non-contiguous elements (just in like in traditional LCS problems), but I’ve worded this particular question to make it more obvious.
Learning by Example
From the outset, it’s not entirely clear what type of data structure we should apply to this problem, but we can always learn a few things by playing around and using examples.
numbers = [1, 7, 3, 5]
# current: ^
Consider the above input. We start with 1
, which by itself is a subsequence of length 1. To track seen subsequences, you might attempt to use a hashmap.
# { 0: [1] }
When we navigate from 1
to 7
, we can say that [1, 7]
gives us a non-decreasing subsequence of length 2 .
numbers = [1, 7, 3, 5]
# current: ^
# {
# 0: [1],
# 1: [1, 7]
# }
When we move on from 7
to 3
, we can also say that [1, 3]
gives us another non-decreasing subsequence of length 2.
numbers = [1, 7, 3, 5]
# current: ^
# {
# 0: [1],
# 1: [1, 7],
# 2: [1, 3]
# }
So far, we’ve seen two subsequences ([1, 7]
and [1, 3]
).
When we move from 3
to 5
, we realise that we can extend [1, 3]
to make [1, 3, 5]
. On the other hand, we can’t do the same with [1, 7]
.
numbers = [1, 7, 3, 5]
# current: ^
# {
# 0: [1],
# 1: [1, 7],
# 2: [1, 3] <-- number 5 should be added here
# }
At this point, we can draw some key insights.
First, we need some way to quickly match the number 5
with the number 3
. The items in the hashmap aren’t ordered by their final elements, so how can we introduce some form of ordering? The answer is to use a sorted collection (like a binary search tree).
Notice that we don’t need to persist [1, 7]
once we’ve included [1, 3]
. Both are of length 2, but [1, 3]
offers more opportunities for extension, since it ends with a smaller number. Thus, we shouldn’t maintain two seen subsequences of the same length –– we should only keep the one that has a smaller last value.
By the same token, we also shouldn’t maintain two subsequences with the same last element –– we should only keep the one with the longer length.
Tracking Candidate Subsequences
We now know that, to track seen subsequences, we need to implement a sorted collection with two invariants.
Recall that we ultimately want the length of the longest subsequence. We’ll represent any relevant candidate subsequence we’ve seen with an ActiveSeq
:
from collections import namedtuple
ActiveSeq = namedtuple('ActiveSeq', ('last_val', 'length'))
For every active sequence, we only need to know the length of the sequence and the last value in it. Other kinds of information, such as the contents of the sequence, are irrelevant.
We use a SortedList
from the sortedcontainers
library to store these active sequences and order them by their last value.
from sortedcontainers import SortedList
def longest_nondecreasing_subsequence_length(A):
SL = SortedList(key=lambda e: e.last_val)
# ...
Python doesn’t have a built-in solution for sorted collections, so I recommend using sortedcontainers
when you need these types of data structures. Check out the offical API documentation as well as the following talk by the library’s author, Grant Jenks:
Extending Candidate Subsequences
Each time we encounter an element, our concern is to find the active sequence that is best suited for extension. We’ll use bisect
to help us determine this:
for x in A:
new_entry = ActiveSeq(x, 1)
i = SL.bisect_right(new_entry)
We duplicate and extend the most suitable list we find. You might be wondering, why do we duplicate active sequences?
Consider the example below. Even though 7
extends [1, 3]
to give [1, 3, 7]
(i.e. ActiveSeq(7, 3)
), there’s no telling if we’ll encounter a smaller element down the line which also extends it. Thus, we’ll retain [1, 3]
(i.e. ActiveSeq(3, 2)
) in our sorted list.
numbers = [1, 3, 7, 3]
# current: ^
for x in A:
i = SL.bisect_right(ActiveSeq(x, 1))
SL.add(ActiveSeq(x, SL[i-1].length+1))
# [ActiveSeq(1, 1), ActiveSeq(3, 2), ActiveSeq(7, 3)]
What if the number we’ve encountered is the smallest we’ve seen so far? Such a number cannot extend any active sequence, so we’ll just insert a fresh active sequence of length 1:
for x in A:
new_entry = ActiveSeq(x, 1)
i = SL.bisect_right(new_entry)
if i == 0:
SL.add(new_entry) # Add sequence of length 1
else:
SL.add(ActiveSeq(x, SL[i-1].length+1))
What if the number we’ve encountered is the same as the last value on an active sequence? We can’t simply increment the .length
property on that particular active sequence, so we’ll replace it instead:
for x in A:
new_entry = ActiveSeq(x, 1)
i = SL.bisect_right(new_entry)
if i == 0:
SL.add(new_entry)
else:
prev_entry = SL[i-1]
if prev_entry.last_val == x:
SL.remove(prev_entry) # Remove before adding
SL.add(ActiveSeq(x, prev_entry.length+1))
This replacement operation guarantees our second invariant, which is to not maintain two active sequences with the same last value.
Remember the other invariant which required us to not maintain two active sequences of the same length? We can enforce it by deleting active sequences that have larger last values but shorter lengths.
numbers = [1, 3, 7, 3]
# current: ^
def longest_nondecreasing_subsequence_length(A):
n = len(A)
SL = sortedcontainers.SortedList(key=lambda e: e.last_val)
for x in A:
# Incorporate x into the sorted list
# ...
# Delete shorter active sequences on the right
new_entry_idx = SL.index(new_entry)
while (
len(SL) >= new_entry_idx + 2 and
SL[new_entry_idx+1].length <= new_entry.length
):
SL.remove(SL[new_entry_idx+1])
# Before: [ActiveSeq(1, 1), ActiveSeq(3, 2), ActiveSeq(7, 3)]
# During: [ActiveSeq(1, 1), ActiveSeq(3, 3), ActiveSeq(7, 3)]
# After: [ActiveSeq(1, 1), ActiveSeq(3, 3)]
Once we’ve traversed all the numbers, we’ll just return the length of the active sequence with the largest last value. This is the last item in the sorted list.
Thanks to the two invariants we established earlier, it must be the case that the last item in our sorted list represents the longest non-decreasing subsequence.
def longest_nondecreasing_subsequence_length(A):
# Maintain active sequences and clone + extend them
# ...
return SL[-1].length
Given n
number of numbers, our algorithm runs in O(n)
space and O(n log n)
time.
With that, we’ve solved the Longest Non-Contiguous, Non-Decreasing Subsequence problem 🚃🚃🚃.
Full Solution
def longest_nondecreasing_subsequence_length(A):
n = len(A)
SL = sortedcontainers.SortedList(key=lambda e: e.last_val)
for x in A:
new_entry = ActiveSeq(x, 1)
i = SL.bisect_right(new_entry)
if i == 0:
SL.add(new_entry)
else:
prev_entry = SL[i-1]
new_entry = ActiveSeq(x, prev_entry.length + 1)
if prev_entry.last_val == x:
SL.remove(prev_entry)
SL.add(new_entry)
new_entry_idx = SL.index(new_entry)
while (
len(SL) >= new_entry_idx + 2 and
SL[new_entry_idx+1].length <= new_entry.length
):
SL.remove(SL[new_entry_idx+1])
return SL[-1].length