Longest Common Subsequence
Define the LCS recurrence for two strings, fill the 2D table, and reconstruct the actual subsequence by back-tracing through the table.
What is a Subsequence?
A subsequence of a string is formed by deleting some (or no) characters without changing the order of the remaining characters. For example, 'ACE' is a subsequence of 'ABCDE' but 'AEC' is not (order violated). The Longest Common Subsequence (LCS) of two strings is the longest subsequence that appears in both. 'ABCBDAB' and 'BDCABA' share LCS 'BCBA' or 'BDAB' of length 4.
# Subsequence vs Substring
# 'ACE' is a subsequence of 'ABCDE' (skip B, D)
# 'ACE' is NOT a substring of 'ABCDE' (must be contiguous)
# LCS examples:
# LCS('ABCBDAB', 'BDCABA') = 4 ('BCBA' or 'BDAB')
# LCS('AGGTAB', 'GXTXAYB') = 4 ('GTAB')
# LCS('ABC', 'AC') = 2 ('AC')
print('Subsequence check: ACE in ABCDE')
text = 'ABCDE'
pattern = 'ACE'
i = 0
for ch in text:
if i < len(pattern) and ch == pattern[i]: i += 1
print('Found:', i == len(pattern)) # TrueLCS Recurrence Derivation
Define dp[i][j] = length of LCS of text1[:i] and text2[:j]. If the characters match (text1[i-1] == text2[j-1]), we extend the LCS by 1: dp[i][j] = dp[i-1][j-1] + 1. If they don't match, we take the better of skipping a character from either string: dp[i][j] = max(dp[i-1][j], dp[i][j-1]). Base case: dp[0][j] = dp[i][0] = 0 (LCS with empty string is 0).
def lcs_length(text1, text2):
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 # extend match
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) # skip one
return dp[m][n]
print(lcs_length('ABCBDAB', 'BDCABA')) # 4
print(lcs_length('AGGTAB', 'GXTXAYB')) # 4
print(lcs_length('ABC', 'AC')) # 2All lessons in this course
- Unique Paths and Minimum Path Sum on Grids
- Longest Common Subsequence
- Edit Distance (Levenshtein)
- Space Optimisation for 2D DP