Topological Sort: Linear Ordering of DAG Vertices
How do I implement topological sort and when to use it?
TL;DR
- Bottom line: Topological sort produces a linear ordering of vertices in a Directed Acyclic Graph (DAG) such that for every edge (u, v), vertex u appears before v -- essential for dependency resolution, build systems, and task scheduling.
- Key tool/command: Kahn's algorithm (BFS with in-degree tracking) or DFS with post-order reversal.
- Watch out for: Applying topological sort to a graph with cycles -- always detect cycles first or use Kahn's algorithm which detects them automatically (sorted count != vertex count).
- Works with: Any directed acyclic graph; language-agnostic algorithm (O(V+E) time, O(V) space).
Constraints
- Input graph MUST be a Directed Acyclic Graph (DAG); topological sort is undefined for graphs with cycles
- A DAG may have multiple valid topological orderings; algorithms return one valid ordering unless explicitly enumerating all
- Always run cycle detection before or during topological sort; silently ignoring cycles produces incorrect orderings
- All-orderings enumeration is O(n! * V) worst case; never enumerate on graphs with more than ~20 vertices without pruning
Quick Reference
| Algorithm | Time | Space | Cycle Detection | All Orderings | Approach |
|---|---|---|---|---|---|
| Kahn's (BFS) | O(V+E) | O(V) | Yes (count != V) | No (one ordering) | In-degree queue; remove zero-indegree nodes iteratively |
| DFS-based | O(V+E) | O(V) | Yes (back-edge / color) | No (one ordering) | Post-order traversal, reverse the result |
| DFS + backtracking | O(V! * V) | O(V) | Yes | Yes (all orderings) | Enumerate all valid orderings via backtracking |
| Parallel Kahn's | O(V+E) work, O(D) depth | O(V) | Yes | No | Process all zero-indegree nodes in parallel per level |
| Coffman-Graham | O(V+E) | O(V) | Yes | No | Kahn's variant with lexicographic tie-breaking for scheduling |
Decision Tree
START
├── Is the graph guaranteed to be a DAG (no cycles)?
│ ├── NO → Use Kahn's algorithm (detects cycles when sorted_count != V)
│ └── YES ↓
├── Do you need to identify parallelizable groups (levels)?
│ ├── YES → Use Kahn's algorithm with level tracking
│ └── NO ↓
├── Do you need ALL valid topological orderings?
│ ├── YES → Use DFS + backtracking (only for small graphs, V <= 20)
│ └── NO ↓
├── Is the implementation language naturally recursive?
│ ├── YES → Use DFS-based topological sort (simpler code)
│ └── NO → Use Kahn's algorithm (iterative, no stack overflow risk)
└── DEFAULT → Kahn's algorithm (most versatile, built-in cycle detection)
Step-by-Step Guide
1. Build the adjacency list and compute in-degrees
Create an adjacency list representation and count incoming edges for each vertex. [src3]
from collections import deque, defaultdict
def build_graph(num_vertices, edges):
adj = defaultdict(list)
in_degree = [0] * num_vertices
for u, v in edges:
adj[u].append(v)
in_degree[v] += 1
return adj, in_degree
Verify: assert sum(in_degree) == len(edges) -- in-degree sum equals total edge count.
2. Initialize queue with zero-indegree vertices
All vertices with in-degree 0 have no dependencies and can appear first. [src3]
queue = deque(v for v in range(num_vertices) if in_degree[v] == 0)
Verify: If queue is empty but graph has vertices, the graph contains a cycle.
3. Process vertices in BFS order (Kahn's core loop)
Dequeue a vertex, add it to the result, and decrement in-degrees of neighbors. Enqueue neighbors that reach in-degree 0. [src1]
result = []
while queue:
node = queue.popleft()
result.append(node)
for neighbor in adj[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
Verify: len(result) == num_vertices -- if not, cycle exists.
4. DFS-based alternative
Visit each unvisited vertex, recurse into neighbors, push vertex after all neighbors processed, then reverse. Uses 3-color marking for cycle detection. [src4]
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * num_vertices
order = []
def dfs(node):
color[node] = GRAY
for neighbor in adj[node]:
if color[neighbor] == GRAY:
raise ValueError("Cycle detected")
if color[neighbor] == WHITE:
dfs(neighbor)
color[node] = BLACK
order.append(node)
for v in range(num_vertices):
if color[v] == WHITE:
dfs(v)
order.reverse()
Verify: No ValueError raised and len(order) == num_vertices.
Code Examples
Python: Course Schedule (LeetCode 210 pattern)
# Input: numCourses=4, prerequisites=[[1,0],[2,0],[3,1],[3,2]]
# Output: [0, 1, 2, 3] or [0, 2, 1, 3] (any valid ordering)
from collections import deque, defaultdict
def find_course_order(num_courses, prerequisites):
adj = defaultdict(list)
in_degree = [0] * num_courses
for course, prereq in prerequisites:
adj[prereq].append(course)
in_degree[course] += 1
queue = deque(c for c in range(num_courses) if in_degree[c] == 0)
order = []
while queue:
course = queue.popleft()
order.append(course)
for next_course in adj[course]:
in_degree[next_course] -= 1
if in_degree[next_course] == 0:
queue.append(next_course)
return order if len(order) == num_courses else [] # empty = cycle
JavaScript: Build Order / Task Scheduling
// Input: tasks = ["a","b","c","d"], deps = [["a","b"],["a","c"],["b","d"]]
// Output: ["a", "b", "c", "d"] or ["a", "c", "b", "d"]
function buildOrder(tasks, deps) {
const adj = new Map();
const inDeg = new Map();
for (const t of tasks) { adj.set(t, []); inDeg.set(t, 0); }
for (const [pre, dep] of deps) {
adj.get(pre).push(dep);
inDeg.set(dep, inDeg.get(dep) + 1);
}
const queue = tasks.filter(t => inDeg.get(t) === 0);
const order = [];
let i = 0;
while (i < queue.length) {
const node = queue[i++];
order.push(node);
for (const neighbor of adj.get(node)) {
inDeg.set(neighbor, inDeg.get(neighbor) - 1);
if (inDeg.get(neighbor) === 0) queue.push(neighbor);
}
}
if (order.length !== tasks.length) throw new Error("Cycle detected");
return order;
}
Java: Dependency Resolution with Parallel Level Detection
// Input: numNodes=6, edges={{5,2},{5,0},{4,0},{4,1},{2,3},{3,1}}
// Output: levels [[4,5],[0,2],[3],[1]] -- nodes at each level can run in parallel
import java.util.*;
public class ParallelTopSort {
public static List<List<Integer>> topSortLevels(int n, int[][] edges) {
List<List<Integer>> adj = new ArrayList<>();
int[] inDeg = new int[n];
for (int i = 0; i < n; i++) adj.add(new ArrayList<>());
for (int[] e : edges) {
adj.get(e[0]).add(e[1]);
inDeg[e[1]]++;
}
Queue<Integer> queue = new LinkedList<>();
for (int i = 0; i < n; i++)
if (inDeg[i] == 0) queue.add(i);
List<List<Integer>> levels = new ArrayList<>();
int processed = 0;
while (!queue.isEmpty()) {
List<Integer> level = new ArrayList<>();
int size = queue.size();
for (int i = 0; i < size; i++) {
int node = queue.poll();
level.add(node);
processed++;
for (int nb : adj.get(node)) {
if (--inDeg[nb] == 0) queue.add(nb);
}
}
levels.add(level);
}
if (processed != n) throw new RuntimeException("Cycle detected");
return levels;
}
}
Anti-Patterns
Wrong: No cycle detection in Kahn's algorithm
# BAD -- silently returns partial ordering on cyclic graphs
def naive_topo_sort(n, edges):
adj, in_degree = build_graph(n, edges)
queue = deque(v for v in range(n) if in_degree[v] == 0)
result = []
while queue:
node = queue.popleft()
result.append(node)
for nb in adj[node]:
in_degree[nb] -= 1
if in_degree[nb] == 0:
queue.append(nb)
return result # May return partial list without warning!
Correct: Verify sorted count equals vertex count
# GOOD -- detects cycles by checking result length
def safe_topo_sort(n, edges):
adj, in_degree = build_graph(n, edges)
queue = deque(v for v in range(n) if in_degree[v] == 0)
result = []
while queue:
node = queue.popleft()
result.append(node)
for nb in adj[node]:
in_degree[nb] -= 1
if in_degree[nb] == 0:
queue.append(nb)
if len(result) != n:
raise ValueError("Graph contains a cycle")
return result
Wrong: Incorrect in-degree computation
# BAD -- counting both directions inflates in-degrees
def broken_indegree(n, edges):
in_degree = [0] * n
for u, v in edges:
in_degree[v] += 1
in_degree[u] += 1 # Wrong! u is source, not destination
return in_degree
Correct: Only increment destination vertex in-degree
# GOOD -- only the destination vertex gains an incoming edge
def correct_indegree(n, edges):
in_degree = [0] * n
for u, v in edges:
in_degree[v] += 1 # Edge u -> v means v has one more incoming edge
return in_degree
Wrong: DFS without 3-color marking (misses cycles)
# BAD -- simple visited set cannot distinguish back-edges from cross-edges
def broken_dfs_topo(n, adj):
visited = set()
order = []
def dfs(node):
visited.add(node)
for nb in adj[node]:
if nb not in visited:
dfs(nb)
order.append(node)
for v in range(n):
if v not in visited:
dfs(v)
order.reverse()
return order # No cycle detection at all!
Correct: Use WHITE/GRAY/BLACK coloring for back-edge detection
# GOOD -- GRAY node in DFS stack means back-edge (cycle)
def correct_dfs_topo(n, adj):
WHITE, GRAY, BLACK = 0, 1, 2
color = [WHITE] * n
order = []
def dfs(node):
color[node] = GRAY
for nb in adj[node]:
if color[nb] == GRAY:
raise ValueError("Cycle: back edge found")
if color[nb] == WHITE:
dfs(nb)
color[node] = BLACK
order.append(node)
for v in range(n):
if color[v] == WHITE:
dfs(v)
order.reverse()
return order
Common Pitfalls
- Not handling disconnected components: Both Kahn's and DFS must iterate over ALL vertices, not just start from vertex 0. Fix: iterate
for v in range(V)in DFS; Kahn's handles this naturally. [src2] - Confusing edge direction in prerequisites: In course schedule problems,
[course, prereq]means prereq -> course. Reversing direction produces wrong orderings. Fix: read edge semantics carefully. [src5] - Stack overflow on large graphs with DFS: Recursive DFS hits Python's default 1000-recursion limit. Fix: use iterative DFS with explicit stack, or use Kahn's algorithm. [src4]
- Assuming unique topological ordering: A DAG may have many valid orderings. Fix: use
heapqor priority queue for deterministic (lexicographically smallest) output. [src1] - Mutating the input graph: Some implementations modify in-degree arrays in place. Fix: work on copies; never remove edges from the original adjacency list. [src3]
- Applying to undirected graphs: Every undirected edge creates a "cycle" between its endpoints. Fix: verify graph is directed before attempting topological sort. [src1]
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Resolving build/compilation dependencies (Makefile, Gradle, Bazel) | Graph has cycles | Strongly connected components (Tarjan/Kosaraju) |
| Course prerequisite scheduling | Need shortest path in weighted graph | Dijkstra or Bellman-Ford |
| Task scheduling with dependency constraints | Need to find connected components | Union-Find / BFS/DFS connected components |
| Package manager install ordering (npm, pip, apt) | Graph is undirected | BFS/DFS traversal |
| Spreadsheet formula evaluation order | Need maximum flow or matching | Ford-Fulkerson or Hopcroft-Karp |
| Determining parallelizable task levels | Need all-pairs shortest paths | Floyd-Warshall |
| Data pipeline / ETL stage ordering | Ordering doesn't need to respect edge direction | Simple BFS/DFS traversal |
Important Caveats
- Topological sort is a fundamental algorithm dating to Kahn (1962); the algorithm itself is evergreen and will not change, but language-specific implementations may vary in edge cases (e.g., Python recursion limits, Java stream vs. imperative style)
- For extremely large graphs (millions of vertices), consider parallel variants of Kahn's algorithm that process entire levels concurrently
- The DFS-based approach and Kahn's algorithm produce different (but both valid) orderings for the same graph; do not assume they will match
- In competitive programming, topological sort is a prerequisite for DAG dynamic programming (e.g., longest path in DAG, number of paths)