String Matching Algorithms: KMP, Rabin-Karp & Beyond

Type: Software Reference Confidence: 0.95 Sources: 7 Verified: 2026-02-24 Freshness: 2026-02-24

TL;DR

Constraints

Quick Reference

AlgorithmPreprocessingSearch TimeSpaceMulti-patternBest For
Naive (brute force)NoneO(nm)O(1)NoShort patterns, small texts
KMPO(m)O(n)O(m)NoSingle pattern, guaranteed linear
Rabin-KarpO(m)O(n) avg, O(nm) worstO(1)YesMulti-pattern, plagiarism detection
Boyer-MooreO(m + σ)O(n/m) best, O(nm) worstO(m + σ)NoLong patterns, large alphabets
Aho-CorasickO(Σmi)O(n + matches)O(Σmi · σ)YesDictionary matching, many patterns
Z-AlgorithmO(n+m)O(n+m)O(n+m)NoSimpler alternative to KMP

Where: n = text length, m = pattern length, σ = alphabet size. [src1] [src6]

Decision Tree

START
├── Single pattern or multiple patterns?
│   ├── MULTIPLE PATTERNS ↓
│   │   ├── Fixed dictionary of patterns?
│   │   │   ├── YES → Aho-Corasick (O(n + total_matches))
│   │   │   └── NO (patterns change) → Rabin-Karp with multiple hashes
│   │   └── END
│   └── SINGLE PATTERN ↓
├── Need guaranteed worst-case O(n+m)?
│   ├── YES → KMP or Z-Algorithm
│   └── NO (average case OK) ↓
├── Pattern length > 10 and large alphabet (ASCII/Unicode)?
│   ├── YES → Boyer-Moore (sublinear average case)
│   └── NO ↓
├── Production code (not competitive programming)?
│   ├── YES → Use built-in: str.find() / indexOf() / strings.Contains()
│   └── NO ↓
└── DEFAULT → KMP (most versatile, easiest to implement correctly)

Step-by-Step Guide

1. Build the KMP failure function (LPS array)

The Longest Proper Prefix which is also Suffix (LPS) array is the core of KMP. For each position i in the pattern, lps[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix. This tells KMP how far to shift after a mismatch. [src1]

def build_lps(pattern):
    m = len(pattern)
    lps = [0] * m
    length = 0  # length of previous longest prefix suffix
    i = 1
    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]  # don't increment i
            else:
                lps[i] = 0
                i += 1
    return lps

Verify: build_lps("ABABAC") → expected: [0, 0, 1, 2, 3, 0]

2. Run KMP search using the LPS array

With the LPS array built, scan the text. On a mismatch at position j in the pattern, jump to lps[j-1] instead of restarting from 0. Each text character is examined at most twice, guaranteeing O(n+m) total. [src1] [src4]

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    lps = build_lps(pattern)
    matches = []
    i = j = 0
    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
        if j == m:
            matches.append(i - j)
            j = lps[j - 1]
        elif i < n and text[i] != pattern[j]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1
    return matches

Verify: kmp_search("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]

3. Implement Rabin-Karp rolling hash

Choose a base (e.g., 256 for ASCII) and a large prime modulus. Compute the hash of the pattern and the first window. For each subsequent position, update the hash in O(1) by removing the leading character and adding the trailing character. [src2] [src3]

def rabin_karp(text, pattern, base=256, mod=10**9 + 7):
    n, m = len(text), len(pattern)
    if m > n:
        return []
    h = pow(base, m - 1, mod)
    p_hash = t_hash = 0
    for i in range(m):
        p_hash = (p_hash * base + ord(pattern[i])) % mod
        t_hash = (t_hash * base + ord(text[i])) % mod
    matches = []
    for i in range(n - m + 1):
        if p_hash == t_hash:
            if text[i:i + m] == pattern:
                matches.append(i)
        if i < n - m:
            t_hash = (t_hash - ord(text[i]) * h) % mod
            t_hash = (t_hash * base + ord(text[i + m])) % mod
    return matches

Verify: rabin_karp("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]

4. Implement Z-Algorithm as a simpler KMP alternative

The Z-array stores at each position the length of the longest substring starting there that matches a prefix of the string. Concatenate pattern + "$" + text and find positions where z[i] == m. [src6]

def z_function(s):
    n = len(s)
    z = [0] * n
    z[0] = n
    l = r = 0
    for i in range(1, n):
        if i < r:
            z[i] = min(r - i, z[i - l])
        while i + z[i] < n and s[z[i]] == s[i + z[i]]:
            z[i] += 1
        if i + z[i] > r:
            l, r = i, i + z[i]
    return z

def z_search(text, pattern):
    concat = pattern + "$" + text
    z = z_function(concat)
    m = len(pattern)
    return [i - m - 1 for i in range(m + 1, len(concat)) if z[i] == m]

Verify: z_search("ABABDABABCABABABABCABAB", "ABABC") → expected: [5]

Code Examples

Python: KMP with all match positions

# Input:  text string and pattern string
# Output: list of all starting indices where pattern occurs

def kmp_find_all(text, pattern):
    """KMP string matching - O(n+m) guaranteed."""
    n, m = len(text), len(pattern)
    if m == 0:
        return []
    lps = [0] * m
    k = 0
    for i in range(1, m):
        while k > 0 and pattern[k] != pattern[i]:
            k = lps[k - 1]
        if pattern[k] == pattern[i]:
            k += 1
        lps[i] = k
    matches, j = [], 0
    for i in range(n):
        while j > 0 and pattern[j] != text[i]:
            j = lps[j - 1]
        if pattern[j] == text[i]:
            j += 1
        if j == m:
            matches.append(i - m + 1)
            j = lps[j - 1]
    return matches

JavaScript: Rabin-Karp with BigInt for safe modular arithmetic

// Input:  text (string), pattern (string)
// Output: array of starting indices of all matches

function rabinKarp(text, pattern) {
  const n = text.length, m = pattern.length;
  if (m > n) return [];
  const BASE = 256n, MOD = 1000000007n;
  let h = 1n;
  for (let i = 0; i < m - 1; i++) h = (h * BASE) % MOD;

  let pHash = 0n, tHash = 0n;
  for (let i = 0; i < m; i++) {
    pHash = (pHash * BASE + BigInt(pattern.charCodeAt(i))) % MOD;
    tHash = (tHash * BASE + BigInt(text.charCodeAt(i))) % MOD;
  }
  const matches = [];
  for (let i = 0; i <= n - m; i++) {
    if (pHash === tHash && text.slice(i, i + m) === pattern) {
      matches.push(i);
    }
    if (i < n - m) {
      tHash = ((tHash - BigInt(text.charCodeAt(i)) * h) * BASE
        + BigInt(text.charCodeAt(i + m))) % MOD;
      if (tHash < 0n) tHash += MOD;
    }
  }
  return matches;
}

Java: KMP implementation

// Input:  text (String), pattern (String)
// Output: List<Integer> of all starting indices

import java.util.ArrayList;
import java.util.List;

public static List<Integer> kmpSearch(String text, String pattern) {
    int n = text.length(), m = pattern.length();
    int[] lps = new int[m];
    for (int i = 1, len = 0; i < m; ) {
        if (pattern.charAt(i) == pattern.charAt(len)) {
            lps[i++] = ++len;
        } else if (len > 0) {
            len = lps[len - 1];
        } else {
            lps[i++] = 0;
        }
    }
    List<Integer> matches = new ArrayList<>();
    for (int i = 0, j = 0; i < n; ) {
        if (text.charAt(i) == pattern.charAt(j)) { i++; j++; }
        if (j == m) { matches.add(i - j); j = lps[j - 1]; }
        else if (i < n && text.charAt(i) != pattern.charAt(j)) {
            if (j > 0) j = lps[j - 1];
            else i++;
        }
    }
    return matches;
}

Anti-Patterns

Wrong: Restarting pattern comparison from scratch on mismatch

# BAD -- O(nm) naive approach re-examines characters unnecessarily
def naive_search(text, pattern):
    matches = []
    for i in range(len(text) - len(pattern) + 1):
        if text[i:i+len(pattern)] == pattern:
            matches.append(i)
    return matches

Correct: Use KMP to avoid redundant comparisons

# GOOD -- O(n+m) with failure function, no backtracking in text
def kmp_search(text, pattern):
    lps = build_lps(pattern)
    i = j = 0
    matches = []
    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == len(pattern):
            matches.append(i - j)
            j = lps[j - 1]
        elif i < len(text) and text[i] != pattern[j]:
            j = lps[j - 1] if j else 0
            if j == 0 and text[i] != pattern[0]:
                i += 1
    return matches

Wrong: Rabin-Karp without verifying hash matches

# BAD -- trusts hash equality without character comparison
if p_hash == t_hash:
    matches.append(i)  # WRONG: hash collision = wrong match

Correct: Always verify hash matches with full comparison

# GOOD -- hash is a filter, not proof; always confirm
if p_hash == t_hash:
    if text[i:i+m] == pattern:  # O(m) verification on hash match
        matches.append(i)

Wrong: Using small modulus in Rabin-Karp

# BAD -- mod=101 causes frequent collisions on real text
mod = 101
t_hash = (t_hash * base + ord(text[i])) % mod

Correct: Use a large prime modulus

# GOOD -- 10^9+7 is standard; fits in 64-bit integers
mod = 10**9 + 7
t_hash = (t_hash * base + ord(text[i])) % mod

Common Pitfalls

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Need guaranteed O(n+m) worst case for single patternProduction code where stdlib is availablestr.find(), indexOf(), strings.Contains()
Searching multiple patterns simultaneouslyPattern includes wildcards or regex featuresRegex engine or NFA/DFA-based matcher
Streaming text (can't backtrack)Text fits in memory and pattern is short (<5 chars)Naive search or stdlib
Competitive programming / interviewsNeed approximate/fuzzy matchingEdit distance, BK-trees, fuzzy matching
Text editor find/replace on long documentsSearching in sorted dataBinary search
Plagiarism detection across many documentsNeed only first occurrence in small textstr.find() / indexOf()

Important Caveats

Related Units