find()/indexOf() for production single-pattern search.lps[0] = 0; 1-indexed implementations cause off-by-one errors| Algorithm | Preprocessing | Search Time | Space | Multi-pattern | Best For |
|---|---|---|---|---|---|
| Naive (brute force) | None | O(nm) | O(1) | No | Short patterns, small texts |
| KMP | O(m) | O(n) | O(m) | No | Single pattern, guaranteed linear |
| Rabin-Karp | O(m) | O(n) avg, O(nm) worst | O(1) | Yes | Multi-pattern, plagiarism detection |
| Boyer-Moore | O(m + σ) | O(n/m) best, O(nm) worst | O(m + σ) | No | Long patterns, large alphabets |
| Aho-Corasick | O(Σmi) | O(n + matches) | O(Σmi · σ) | Yes | Dictionary matching, many patterns |
| Z-Algorithm | O(n+m) | O(n+m) | O(n+m) | No | Simpler alternative to KMP |
Where: n = text length, m = pattern length, σ = alphabet size. [src1] [src6]
START
├── Single pattern or multiple patterns?
│ ├── MULTIPLE PATTERNS ↓
│ │ ├── Fixed dictionary of patterns?
│ │ │ ├── YES → Aho-Corasick (O(n + total_matches))
│ │ │ └── NO (patterns change) → Rabin-Karp with multiple hashes
│ │ └── END
│ └── SINGLE PATTERN ↓
├── Need guaranteed worst-case O(n+m)?
│ ├── YES → KMP or Z-Algorithm
│ └── NO (average case OK) ↓
├── Pattern length > 10 and large alphabet (ASCII/Unicode)?
│ ├── YES → Boyer-Moore (sublinear average case)
│ └── NO ↓
├── Production code (not competitive programming)?
│ ├── YES → Use built-in: str.find() / indexOf() / strings.Contains()
│ └── NO ↓
└── DEFAULT → KMP (most versatile, easiest to implement correctly)
The Longest Proper Prefix which is also Suffix (LPS) array is the core of KMP. For each position i in the pattern, lps[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix. This tells KMP how far to shift after a mismatch. [src1]
def build_lps(pattern):
m = len(pattern)
lps = [0] * m
length = 0 # length of previous longest prefix suffix
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1] # don't increment i
else:
lps[i] = 0
i += 1
return lps
Verify: build_lps("ABABAC") → expected: [0, 0, 1, 2, 3, 0]
With the LPS array built, scan the text. On a mismatch at position j in the pattern, jump to lps[j-1] instead of restarting from 0. Each text character is examined at most twice, guaranteeing O(n+m) total. [src1] [src4]
def kmp_search(text, pattern):
n, m = len(text), len(pattern)
lps = build_lps(pattern)
matches = []
i = j = 0
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
matches.append(i - j)
j = lps[j - 1]
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return matches
Verify: kmp_search("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]
Choose a base (e.g., 256 for ASCII) and a large prime modulus. Compute the hash of the pattern and the first window. For each subsequent position, update the hash in O(1) by removing the leading character and adding the trailing character. [src2] [src3]
def rabin_karp(text, pattern, base=256, mod=10**9 + 7):
n, m = len(text), len(pattern)
if m > n:
return []
h = pow(base, m - 1, mod)
p_hash = t_hash = 0
for i in range(m):
p_hash = (p_hash * base + ord(pattern[i])) % mod
t_hash = (t_hash * base + ord(text[i])) % mod
matches = []
for i in range(n - m + 1):
if p_hash == t_hash:
if text[i:i + m] == pattern:
matches.append(i)
if i < n - m:
t_hash = (t_hash - ord(text[i]) * h) % mod
t_hash = (t_hash * base + ord(text[i + m])) % mod
return matches
Verify: rabin_karp("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]
The Z-array stores at each position the length of the longest substring starting there that matches a prefix of the string. Concatenate pattern + "$" + text and find positions where z[i] == m. [src6]
def z_function(s):
n = len(s)
z = [0] * n
z[0] = n
l = r = 0
for i in range(1, n):
if i < r:
z[i] = min(r - i, z[i - l])
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
if i + z[i] > r:
l, r = i, i + z[i]
return z
def z_search(text, pattern):
concat = pattern + "$" + text
z = z_function(concat)
m = len(pattern)
return [i - m - 1 for i in range(m + 1, len(concat)) if z[i] == m]
Verify: z_search("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]
# Input: text string and pattern string
# Output: list of all starting indices where pattern occurs
def kmp_find_all(text, pattern):
"""KMP string matching - O(n+m) guaranteed."""
n, m = len(text), len(pattern)
if m == 0:
return []
lps = [0] * m
k = 0
for i in range(1, m):
while k > 0 and pattern[k] != pattern[i]:
k = lps[k - 1]
if pattern[k] == pattern[i]:
k += 1
lps[i] = k
matches, j = [], 0
for i in range(n):
while j > 0 and pattern[j] != text[i]:
j = lps[j - 1]
if pattern[j] == text[i]:
j += 1
if j == m:
matches.append(i - m + 1)
j = lps[j - 1]
return matches
// Input: text (string), pattern (string)
// Output: array of starting indices of all matches
function rabinKarp(text, pattern) {
const n = text.length, m = pattern.length;
if (m > n) return [];
const BASE = 256n, MOD = 1000000007n;
let h = 1n;
for (let i = 0; i < m - 1; i++) h = (h * BASE) % MOD;
let pHash = 0n, tHash = 0n;
for (let i = 0; i < m; i++) {
pHash = (pHash * BASE + BigInt(pattern.charCodeAt(i))) % MOD;
tHash = (tHash * BASE + BigInt(text.charCodeAt(i))) % MOD;
}
const matches = [];
for (let i = 0; i <= n - m; i++) {
if (pHash === tHash && text.slice(i, i + m) === pattern) {
matches.push(i);
}
if (i < n - m) {
tHash = ((tHash - BigInt(text.charCodeAt(i)) * h) * BASE
+ BigInt(text.charCodeAt(i + m))) % MOD;
if (tHash < 0n) tHash += MOD;
}
}
return matches;
}
// Input: text (String), pattern (String)
// Output: List<Integer> of all starting indices
import java.util.ArrayList;
import java.util.List;
public static List<Integer> kmpSearch(String text, String pattern) {
int n = text.length(), m = pattern.length();
int[] lps = new int[m];
for (int i = 1, len = 0; i < m; ) {
if (pattern.charAt(i) == pattern.charAt(len)) {
lps[i++] = ++len;
} else if (len > 0) {
len = lps[len - 1];
} else {
lps[i++] = 0;
}
}
List<Integer> matches = new ArrayList<>();
for (int i = 0, j = 0; i < n; ) {
if (text.charAt(i) == pattern.charAt(j)) { i++; j++; }
if (j == m) { matches.add(i - j); j = lps[j - 1]; }
else if (i < n && text.charAt(i) != pattern.charAt(j)) {
if (j > 0) j = lps[j - 1];
else i++;
}
}
return matches;
}
# BAD -- O(nm) naive approach re-examines characters unnecessarily
def naive_search(text, pattern):
matches = []
for i in range(len(text) - len(pattern) + 1):
if text[i:i+len(pattern)] == pattern:
matches.append(i)
return matches
# GOOD -- O(n+m) with failure function, no backtracking in text
def kmp_search(text, pattern):
lps = build_lps(pattern)
i = j = 0
matches = []
while i < len(text):
if text[i] == pattern[j]:
i += 1; j += 1
if j == len(pattern):
matches.append(i - j)
j = lps[j - 1]
elif i < len(text) and text[i] != pattern[j]:
j = lps[j - 1] if j else 0
if j == 0 and text[i] != pattern[0]:
i += 1
return matches
# BAD -- trusts hash equality without character comparison
if p_hash == t_hash:
matches.append(i) # WRONG: hash collision = wrong match
# GOOD -- hash is a filter, not proof; always confirm
if p_hash == t_hash:
if text[i:i+m] == pattern: # O(m) verification on hash match
matches.append(i)
# BAD -- mod=101 causes frequent collisions on real text
mod = 101
t_hash = (t_hash * base + ord(text[i])) % mod
# GOOD -- 10^9+7 is standard; fits in 64-bit integers
mod = 10**9 + 7
t_hash = (t_hash * base + ord(text[i])) % mod
lps[0] = 0 and start build loop at i = 1. [src1]base^(m-1) overflows 64-bit integers for long patterns. Fix: apply % mod at every multiplication step, or use BigInt. [src3]% can return negative values, the rolling hash subtraction produces negative hashes. Fix: t_hash = ((t_hash - c * h) % mod + mod) % mod. [src3]if len(pattern) == 0: return [] at the start. [src4]| Use When | Don't Use When | Use Instead |
|---|---|---|
| Need guaranteed O(n+m) worst case for single pattern | Production code where stdlib is available | str.find(), indexOf(), strings.Contains() |
| Searching multiple patterns simultaneously | Pattern includes wildcards or regex features | Regex engine or NFA/DFA-based matcher |
| Streaming text (can't backtrack) | Text fits in memory and pattern is short (<5 chars) | Naive search or stdlib |
| Competitive programming / interviews | Need approximate/fuzzy matching | Edit distance, BK-trees, fuzzy matching |
| Text editor find/replace on long documents | Searching in sorted data | Binary search |
| Plagiarism detection across many documents | Need only first occurrence in small text | str.find() / indexOf() |
str.find(), String.indexOf(), strings.Contains()) typically use optimized Boyer-Moore or two-way algorithm variants and are faster than hand-rolled KMP for most practical inputs.