Union-Find data structure implementation

- Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.

Disjoint Set Union DSU algorithm

- Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.

How to detect cycles with Union-Find

- Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.

Kruskal MST with Union-Find

- Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.

Connected components using Union-Find

- Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.

Union-Find (Disjoint Set): Complete Reference

How do I implement and use Union-Find (Disjoint Set)?

TL;DR

Bottom line: Union-Find (Disjoint Set Union) maintains a partition of elements into disjoint sets, supporting near-O(1) amortized find and union operations when using both path compression and union by rank.
Key tool/command: parent[] array with find(x) (path compression) + union(a, b) (by rank/size)
Watch out for: Forgetting path compression -- without it, find degrades to O(n) on skewed trees.
Works with: Any language; no dependencies. Pure algorithmic pattern with O(n) space.

Constraints

Union-Find only works on equivalence relations (reflexive, symmetric, transitive) -- cannot model directed relationships
Cannot efficiently undo union operations -- once two sets are merged, splitting requires a rollback variant
Path compression mutates the tree during find, making the structure non-persistent
Does not support shortest-path, weighted-edge queries, or neighbor enumeration -- only answers "are X and Y connected?"
Union by rank and union by size are mutually exclusive -- mixing them corrupts rank invariants

Quick Reference

Operation	Naive	Path Compression Only	Union by Rank Only	Both Optimizations	Amortized (Both)
`make_set(x)`	O(1)	O(1)	O(1)	O(1)	O(1)
`find(x)`	O(n)	O(log n)*	O(log n)	O(α(n))	~O(1)
`union(a, b)`	O(n)	O(log n)*	O(log n)	O(α(n))	~O(1)
`connected(a, b)`	O(n)	O(log n)*	O(log n)	O(α(n))	~O(1)
Space	O(n)	O(n)	O(n)	O(n)	O(n)
Component count	O(1) w/ counter	O(1) w/ counter	O(1) w/ counter	O(1) w/ counter	O(1)
Component size	O(n)	O(n)	O(1) w/ size[]	O(1) w/ size[]	O(1)
Undo last union	N/A	N/A	N/A	N/A	Use rollback DSU

*Amortized over m operations: O(m · α(n)) total, where α is the inverse Ackermann function (effectively ≤ 4 for any practical n < 10⁶⁰⁰). [src1] [src3]

Decision Tree

START: Do you need to track connectivity between elements?
|
+-- YES: Are edges added online (one at a time)?
|   |
|   +-- YES: Do you need to undo/remove edges?
|   |   |
|   |   +-- YES --> Use Link-Cut Trees or Offline DSU with rollback
|   |   +-- NO --> Union-Find (this unit) -- optimal choice
|   |
|   +-- NO (all edges known upfront): Do you need more than connectivity?
|       |
|       +-- YES (shortest path, all neighbors, etc.) --> BFS/DFS on adjacency list
|       +-- NO (just connected components) --> Union-Find OR BFS/DFS (both work)
|
+-- NO: Do you need to find minimum spanning tree?
    |
    +-- YES: Dense graph (E >> V)?
    |   |
    |   +-- YES --> Prim's algorithm with priority queue
    |   +-- NO --> Kruskal's algorithm + Union-Find (this unit)
    |
    +-- NO --> This unit is not applicable

Step-by-Step Guide

1. Initialize the data structure

Create a parent array where each element is its own root (self-loop). Optionally track rank and component count. [src1]

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))  # Each element is its own root
        self.rank = [0] * n            # Rank for union by rank
        self.count = n                 # Number of connected components

Verify: uf = UnionFind(5) → expected: uf.parent == [0, 1, 2, 3, 4]

2. Implement find with path compression

Path compression flattens the tree by making every node on the find path point directly to the root. [src1] [src4]

def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])  # Path compression
    return self.parent[x]

Verify: After find(x), every node on the path has root as direct parent.

3. Implement union by rank

Attach the shorter tree under the taller tree to keep height logarithmic. [src1] [src4]

def union(self, x, y):
    root_x = self.find(x)
    root_y = self.find(y)
    if root_x == root_y:
        return False
    if self.rank[root_x] < self.rank[root_y]:
        root_x, root_y = root_y, root_x
    self.parent[root_y] = root_x
    if self.rank[root_x] == self.rank[root_y]:
        self.rank[root_x] += 1
    self.count -= 1
    return True

Verify: uf.union(0, 1) → expected: uf.count == 4, uf.find(0) == uf.find(1)

4. Add connected query and component tracking

def connected(self, x, y):
    return self.find(x) == self.find(y)

def component_count(self):
    return self.count

Verify: uf.union(0, 1); uf.union(1, 2) → expected: uf.connected(0, 2) == True

5. (Optional) Track component sizes with union by size

Replace rank with size for applications needing component sizes. [src2]

class UnionFindBySize:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n
        self.count = n

    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        root_x, root_y = self.find(x), self.find(y)
        if root_x == root_y:
            return False
        if self.size[root_x] < self.size[root_y]:
            root_x, root_y = root_y, root_x
        self.parent[root_y] = root_x
        self.size[root_x] += self.size[root_y]
        self.count -= 1
        return True

    def get_size(self, x):
        return self.size[self.find(x)]

Code Examples

Python: Kruskal's MST with Union-Find

# Input:  n (vertices), edges as [(weight, u, v), ...]
# Output: MST edges and total weight

def kruskal_mst(n, edges):
    uf = UnionFind(n)
    edges.sort()  # Sort by weight
    mst = []
    total_weight = 0
    for weight, u, v in edges:
        if uf.union(u, v):       # Only add if no cycle
            mst.append((u, v, weight))
            total_weight += weight
            if len(mst) == n - 1:  # MST complete
                break
    return mst, total_weight

JavaScript: Union-Find with Path Compression and Rank

// Input:  n (number of elements)
// Output: UnionFind object with find, union, connected

class UnionFind {
  constructor(n) {
    this.parent = Array.from({length: n}, (_, i) => i);
    this.rank = new Array(n).fill(0);
    this.count = n;
  }
  find(x) {
    if (this.parent[x] !== x)
      this.parent[x] = this.find(this.parent[x]);
    return this.parent[x];
  }
  union(x, y) {
    let rx = this.find(x), ry = this.find(y);
    if (rx === ry) return false;
    if (this.rank[rx] < this.rank[ry]) [rx, ry] = [ry, rx];
    this.parent[ry] = rx;
    if (this.rank[rx] === this.rank[ry]) this.rank[rx]++;
    this.count--;
    return true;
  }
  connected(x, y) { return this.find(x) === this.find(y); }
}

Java: Union-Find for Cycle Detection

// Input:  n vertices, int[][] edges (each [u, v])
// Output: true if graph contains a cycle

public class UnionFind {
    private int[] parent, rank;
    public UnionFind(int n) {
        parent = new int[n]; rank = new int[n];
        for (int i = 0; i < n; i++) parent[i] = i;
    }
    public int find(int x) {
        if (parent[x] != x) parent[x] = find(parent[x]);
        return parent[x];
    }
    public boolean union(int x, int y) {
        int rx = find(x), ry = find(y);
        if (rx == ry) return false;
        if (rank[rx] < rank[ry]) { int t = rx; rx = ry; ry = t; }
        parent[ry] = rx;
        if (rank[rx] == rank[ry]) rank[rx]++;
        return true;
    }
    public static boolean hasCycle(int n, int[][] edges) {
        UnionFind uf = new UnionFind(n);
        for (int[] e : edges)
            if (!uf.union(e[0], e[1])) return true;
        return false;
    }
}

C++: DSU with Size Tracking

// Input:  n (number of elements)
// Output: DSU struct with find, unite, connected, get_size

struct DSU {
    vector<int> parent, sz;
    int components;
    DSU(int n) : parent(n), sz(n, 1), components(n) {
        iota(parent.begin(), parent.end(), 0);
    }
    int find(int x) {
        return parent[x] == x ? x : parent[x] = find(parent[x]);
    }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (sz[a] < sz[b]) swap(a, b);
        parent[b] = a;
        sz[a] += sz[b];
        components--;
        return true;
    }
    bool connected(int a, int b) { return find(a) == find(b); }
    int get_size(int a) { return sz[find(a)]; }
};

Anti-Patterns

Wrong: Find without path compression

# BAD -- O(n) worst case, tree becomes a linked list
def find(self, x):
    while self.parent[x] != x:
        x = self.parent[x]
    return x

Correct: Find with path compression

# GOOD -- amortized O(alpha(n)), tree stays flat
def find(self, x):
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])
    return self.parent[x]

Wrong: Union without rank/size

# BAD -- always attaches under same root, creates O(n) chains
def union(self, x, y):
    root_x, root_y = self.find(x), self.find(y)
    if root_x != root_y:
        self.parent[root_y] = root_x  # No rank/size check

Correct: Union by rank

# GOOD -- keeps tree balanced, O(log n) height guarantee
def union(self, x, y):
    root_x, root_y = self.find(x), self.find(y)
    if root_x == root_y:
        return False
    if self.rank[root_x] < self.rank[root_y]:
        root_x, root_y = root_y, root_x
    self.parent[root_y] = root_x
    if self.rank[root_x] == self.rank[root_y]:
        self.rank[root_x] += 1
    return True

Wrong: Mixing rank and size

# BAD -- updating rank as if it were size corrupts the invariant
self.rank[root_x] += self.rank[root_y]  # Wrong! Rank != size

Correct: Choose one -- rank (increment on tie) or size (always add)

# GOOD -- rank: only increment when ranks are equal
if self.rank[root_x] == self.rank[root_y]:
    self.rank[root_x] += 1

# GOOD -- size: always add sizes together
self.size[root_x] += self.size[root_y]

Common Pitfalls

Comparing parent[x] instead of find(x): parent[x] may not be the root after path compression. Fix: always compare find(x) == find(y). [src1]
Stack overflow with recursive find: Recursive path compression exceeds stack depth for n > 10⁵. Fix: use iterative two-pass path compression. [src1]
Not returning merge status from union: Kruskal and cycle detection need to know if union created a new connection. Fix: return True/False from union(). [src5]
Using Union-Find for directed cycle detection: Union-Find only works for undirected graphs. Fix: use DFS with coloring for directed graphs. [src2]
Initializing parent array to zeros: parent = [0] * n makes every element's root 0. Fix: parent = list(range(n)). [src6]
Forgetting to decrement component count: Every successful union must decrement the counter. Fix: add self.count -= 1 after merge. [src7]

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Checking if two nodes share a connected component	You need the actual path between nodes	BFS/DFS traversal
Building MST with Kruskal's algorithm	Graph is dense (E close to V²)	Prim's algorithm
Detecting cycles in undirected graphs	Detecting cycles in directed graphs	DFS with back-edge detection
Grouping equivalent items (accounts merge, synonyms)	You need to split groups or undo unions	Link-Cut Trees / rollback DSU
Online connectivity (edges added incrementally)	All edges known upfront + need full traversal	BFS/DFS (simpler)
Percolation simulation / grid connectivity	You need weighted shortest path	Dijkstra / Bellman-Ford
Very large graphs (10⁶+ nodes)	Graph fits in memory and BFS/DFS suffices	BFS/DFS (simpler to code)

Important Caveats

The inverse Ackermann function α(n) is effectively constant (≤ 4) for any practical input, but this is an amortized bound -- individual operations can still take O(log n) worst case.
Path compression is not thread-safe. For concurrent applications, use atomic CAS or lock the find/union operations.
Union-Find with path compression is not suitable for persistent/functional data structures. For immutable settings, use weighted union without path compression (O(log n) per op).
Python's default recursion limit (1000) causes RecursionError for large inputs with recursive find. Use sys.setrecursionlimit() or iterative find.