PostgreSQL Index Bloat: Diagnose and Fix 2026
How do I diagnose and fix PostgreSQL index bloat?
TL;DR
- Bottom line: PostgreSQL index bloat is diagnosed with
pgstattupleor bloat-estimator SQL and fixed withREINDEX INDEX CONCURRENTLY(PG 12+) for zero-downtime rebuilds, orpg_repackwhen you need to reorganize both table and indexes. Prevent recurrence with fillfactor tuning, HOT updates, and aggressive autovacuum settings. - Key tool/command:
REINDEX INDEX CONCURRENTLY idx_name;- rebuilds the index without blocking writes. Fall back topg_repack --index idx_namefor very large indexes or when xmin horizon concerns block reindex. - Watch out for: A failed
REINDEX CONCURRENTLYleaves behindINVALIDindexes with suffixes_ccnewor_ccoldthat still consume write overhead but are never used for reads. You mustDROP INDEX CONCURRENTLYthem before retrying. - Works with: PostgreSQL 12+.
REINDEX CONCURRENTLYadded in PG 12; corruption bug fixed in PG 14.0 (do not use on PG 12-13 for unique indexes without latest minor-version patches).pg_repackworks on PG 9.4+.
Constraints
- REINDEX CONCURRENTLY requires PostgreSQL 12 or later. On PG 11 or older, use the manual pattern:
CREATE INDEX CONCURRENTLY new_idx ...; DROP INDEX CONCURRENTLY old_idx; ALTER INDEX new_idx RENAME TO old_idx;. - REINDEX CONCURRENTLY cannot rebuild exclusion-constraint indexes or system catalog indexes, and cannot run inside a transaction block for partitioned indexes/tables. [src1]
- pg_repack requires a primary key or unique not-null index on every table it processes. Tables without one must use
REINDEXor be restructured first. [src5] - pg_repack needs 2x disk space - the table and all indexes are fully duplicated during the operation. Check
pg_database_size()before starting on a full disk. [src5] - Never run
VACUUM FULLon production during business hours - it takesACCESS EXCLUSIVEfor the entire operation, blocking both reads and writes. Usepg_repackorREINDEX CONCURRENTLYinstead. REINDEX CONCURRENTLYholds the xmin horizon for its duration - autovacuum cannot clean dead tuples in ANY table cluster-wide until it finishes. Batch one index at a time on high-churn systems. [src4]
Quick Reference
| # | Symptom / Goal | Command / Solution | Notes |
|---|---|---|---|
| 1 | Check index sizes vs table sizes | SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC; | Index larger than table = likely bloat |
| 2 | Measure bloat precisely (slow but exact) | CREATE EXTENSION pgstattuple; SELECT * FROM pgstatindex('idx_name'); | Full index scan; avoid on huge indexes |
| 3 | Measure bloat approximately (fast) | SELECT * FROM pgstattuple_approx('idx_name'); | Sample-based; use for large indexes |
| 4 | Rebuild one index online (PG 12+) | REINDEX INDEX CONCURRENTLY idx_name; | No write lock; ~2x slower than REINDEX |
| 5 | Rebuild all indexes on a table online | REINDEX TABLE CONCURRENTLY my_table; | Sequential per index; takes hours on large tables |
| 6 | Rebuild with pg_repack (alternative) | pg_repack -d mydb --index idx_name | Works on older PG; no xmin horizon impact |
| 7 | Find invalid indexes from failed rebuild | SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid; | Look for _ccnew / _ccold suffixes |
| 8 | Drop invalid index | DROP INDEX CONCURRENTLY idx_name_ccnew; | Must do before retrying REINDEX |
| 9 | Unused indexes (candidates to drop) | SELECT indexrelname FROM pg_stat_user_indexes WHERE idx_scan = 0; | Stats reset on restart - check uptime |
| 10 | Set fillfactor for write-heavy table | ALTER TABLE t SET (FILLFACTOR = 80); | Enables HOT updates; reduces index bloat 60%+ |
| 11 | Check HOT update ratio | SELECT relname, n_tup_hot_upd::float / NULLIF(n_tup_upd,0) FROM pg_stat_user_tables; | Goal: > 0.8 on frequently-updated tables |
| 12 | Aggressive autovacuum on hot table | ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0.01); | Vacuums at 1% dead instead of 20% default |
| 13 | Rebuild with pg_repack (table + indexes) | pg_repack -d mydb --table my_table -k | Requires PK; 2x disk space |
| 14 | Pre-check disk space before pg_repack | SELECT pg_size_pretty(pg_total_relation_size('t')); | Need 2x this free |
| 15 | BRIN alternative for time-series | CREATE INDEX CONCURRENTLY idx_brin ON t USING BRIN (created_at); | BRIN barely bloats; 1000x smaller than btree |
Decision Tree
START - Suspected index bloat
|-- Step 1: CONFIRM it is index bloat
| |-- Index size > table size (pg_relation_size) -> likely bloat
| |-- pgstattuple: leaf_fragmentation > 20% or avg_leaf_density < 50% -> confirmed
| |-- Disk growing but row count flat -> bloat suspected
| `-- If no -> use postgresql-slow-queries/2026 or VACUUM for table bloat
|
|-- Step 2: CHOOSE remediation
| |-- Running PostgreSQL 12+?
| | |-- Single bloated index < 50GB -> REINDEX INDEX CONCURRENTLY
| | |-- Multiple indexes on one table -> REINDEX TABLE CONCURRENTLY
| | |-- Table + all indexes bloated; table has PK -> pg_repack
| | `-- Cannot tolerate xmin horizon hold -> pg_repack
| |-- Running PostgreSQL 11 or older?
| | `-- CREATE INDEX CONCURRENTLY new; DROP INDEX CONCURRENTLY old; RENAME
| `-- Maintenance window + small index?
| `-- REINDEX INDEX idx_name (blocks writes but ~2x faster)
|
|-- Step 3: VERIFY fix
| |-- Re-run pgstatindex -> leaf_fragmentation should be < 10%
| |-- Check pg_relation_size -> should drop significantly
| `-- Check for invalid indexes -> indisvalid must be true for all
|
`-- Step 4: PREVENT recurrence
|-- Write-heavy table? -> ALTER TABLE SET (FILLFACTOR = 80) + re-REINDEX
|-- HOT ratio < 80%? -> Reduce indexes on updated columns
|-- Append-only table with range queries? -> Replace btree with BRIN
|-- Time-series data? -> Partition by date + drop old partitions
`-- Tune autovacuum_vacuum_scale_factor = 0.01 on high-churn tables
Step-by-Step Guide
1. Identify bloated indexes
Start with size comparison - indexes larger than their tables are a strong signal. [src3, src8]
-- Indexes ordered by size, with table-relative ratio
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
round(pg_relation_size(indexrelid)::numeric
/ NULLIF(pg_relation_size(relid), 0), 2) AS idx_to_table_ratio,
idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
-- Indexes with idx_to_table_ratio > 0.5 warrant a closer look.
Verify: Any index with ratio > 0.5 AND > 1 GB is a strong bloat candidate.
2. Measure bloat precisely with pgstattuple
For confirmed bloat analysis, pgstattuple is the authoritative source. [src2]
-- One-time: install the extension (superuser)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
-- Exact measurement (slow on large indexes - full scan)
SELECT * FROM pgstatindex('public.orders_customer_id_idx');
-- Key columns:
-- avg_leaf_density : lower = more bloat (< 50% is bad, < 30% is critical)
-- leaf_fragmentation: higher = more bloat (> 20% warrants reindex)
-- Approximate measurement (fast, sample-based - PG 9.5+)
SELECT * FROM pgstattuple_approx('public.orders_customer_id_idx');
-- Use on multi-GB indexes to avoid table scans in production.
Verify: avg_leaf_density < 50 or leaf_fragmentation > 20 confirms bloat requiring remediation.
3. Drop any invalid indexes from prior failed rebuilds
Before rebuilding, clear debris from prior concurrent attempts. [src1]
-- Find any invalid indexes (failed REINDEX CONCURRENTLY artifacts)
SELECT
c.relname AS index_name,
pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
WHERE NOT i.indisvalid;
-- Drop them (use CONCURRENTLY to avoid locks)
DROP INDEX CONCURRENTLY IF EXISTS public.orders_customer_id_idx_ccnew;
DROP INDEX CONCURRENTLY IF EXISTS public.orders_customer_id_idx_ccold;
Verify: Re-run the invalid-index query - should return zero rows.
4. Rebuild with REINDEX CONCURRENTLY (primary fix)
This is the zero-downtime remediation for PG 12+. [src1, src6]
-- Single bloated index (no write lock; reads and writes continue)
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- All indexes on a table (processed sequentially)
REINDEX TABLE CONCURRENTLY public.orders;
-- Parallel workers (PG 14+): control via max_parallel_maintenance_workers
SET max_parallel_maintenance_workers = 4;
REINDEX INDEX CONCURRENTLY public.large_idx;
Verify: SELECT pg_size_pretty(pg_relation_size('public.orders_customer_id_idx')); - size should drop significantly (often 50-90% reduction on heavily bloated indexes).
5. Use pg_repack when REINDEX is unsuitable
When the index is very large, the table is also bloated, or you cannot hold the xmin horizon. [src5]
# Install extension once (requires superuser)
psql -d mydb -c "CREATE EXTENSION pg_repack;"
# Rebuild all indexes on a table (table stays writable)
pg_repack -h db.example.com -d mydb --table public.orders -k
# Rebuild only one index
pg_repack -h db.example.com -d mydb --index public.orders_customer_id_idx
# Dry run to verify before touching production
pg_repack -h db.example.com -d mydb --table public.orders --dry-run
Verify: After completion, pg_repack reports "ok" and removes all intermediate objects. Check pg_stat_activity for stale sessions.
6. Tune fillfactor to prevent recurrence
For write-heavy tables, lower fillfactor enables HOT updates, which avoid updating indexes entirely. [src7]
-- Set fillfactor to 80% - leaves room for HOT updates on same page
ALTER TABLE public.orders SET (FILLFACTOR = 80);
-- Rebuild table + indexes to apply fillfactor to existing pages
VACUUM FULL public.orders; -- OR: pg_repack -t public.orders
-- Verify HOT update ratio is climbing
SELECT
relname,
n_tup_upd,
n_tup_hot_upd,
round(n_tup_hot_upd::numeric / NULLIF(n_tup_upd, 0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Goal: hot_pct > 80 on heavily-updated tables.
Verify: hot_pct improves within hours of normal workload. Index bloat accumulation rate drops proportionally.
7. Tune autovacuum for high-churn tables
Default autovacuum triggers at 20% dead tuples - too late for high-churn tables. [src4]
-- Per-table autovacuum settings (more aggressive)
ALTER TABLE public.orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% dead vs 20%
autovacuum_vacuum_threshold = 1000, -- minimum dead tuples
autovacuum_analyze_scale_factor = 0.005, -- analyze at 0.5%
autovacuum_vacuum_cost_limit = 2000 -- allow more work per cycle
);
-- Kill idle-in-transaction sessions that block autovacuum
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND now() - state_change > INTERVAL '10 minutes';
Verify: pg_stat_user_tables.last_autovacuum should update frequently; n_dead_tup should stay low.
Code Examples
SQL: comprehensive bloat-detection report
-- Input: pgstattuple extension installed
-- Output: Ranked list of bloated indexes with precise metrics
CREATE EXTENSION IF NOT EXISTS pgstattuple;
WITH idx AS (
SELECT schemaname, indexrelname, indexrelid,
pg_relation_size(indexrelid) AS idx_bytes
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024 -- >100MB only
)
SELECT schemaname, indexrelname,
pg_size_pretty(idx_bytes) AS size,
(pgstatindex(indexrelid::regclass::text)).avg_leaf_density AS density,
(pgstatindex(indexrelid::regclass::text)).leaf_fragmentation AS frag
FROM idx
ORDER BY idx_bytes DESC;
Python: automated reindex worker with safety gates
# Input: connection string + bloat threshold (default: leaf_density < 50)
# Output: reindexes each qualifying index; reports status
import psycopg2, time
conn = psycopg2.connect(CONN_STR)
conn.autocommit = True # REINDEX CONCURRENTLY cannot be in a transaction
with conn.cursor() as cur:
cur.execute("""
SELECT schemaname||'.'||indexrelname AS idx
FROM pg_stat_user_indexes i
WHERE pg_relation_size(indexrelid) > 100*1024*1024
AND (pgstatindex(indexrelid::regclass::text)).avg_leaf_density < 50
""")
for (idx,) in cur.fetchall():
cur.execute(f"REINDEX INDEX CONCURRENTLY {idx}")
print(f"reindexed {idx}")
time.sleep(5) # brief pause to let autovacuum catch up
Bash: orchestrated pg_repack run with disk-space guard
#!/bin/bash
# Input: DB name + table name
# Output: pg_repack runs only if 2x free disk space available
DB="$1"; TABLE="$2"
NEEDED=$(psql -tA -d "$DB" -c "SELECT pg_total_relation_size('$TABLE') * 2")
FREE=$(df -B1 --output=avail /var/lib/postgresql | tail -1)
if [ "$FREE" -lt "$NEEDED" ]; then
echo "ERROR: need $NEEDED bytes, have $FREE" >&2
exit 1
fi
pg_repack -d "$DB" --table "$TABLE" -k --jobs=4
Anti-Patterns
Wrong: VACUUM FULL on production to fix index bloat
-- BAD - takes ACCESS EXCLUSIVE lock on the entire table
-- Blocks ALL reads and writes for minutes to hours
VACUUM FULL public.orders;
-- Queries pile up; connection pool fills; cascading outage
Correct: REINDEX CONCURRENTLY + pg_repack (zero downtime)
-- GOOD - no write lock; production continues normally
REINDEX TABLE CONCURRENTLY public.orders;
-- Or for table + index bloat together:
-- pg_repack -d mydb --table public.orders -k
Wrong: REINDEX without CONCURRENTLY on a live table
-- BAD - takes ACCESS EXCLUSIVE on the index
-- All queries using this index are blocked for the duration
REINDEX INDEX public.orders_customer_id_idx;
Correct: always use CONCURRENTLY on live systems
-- GOOD - reads + writes continue; only ~2x slower
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- Monitor with: SELECT * FROM pg_stat_progress_create_index;
Wrong: ignoring invalid indexes from failed rebuilds
-- BAD - orphan _ccnew / _ccold indexes persist
-- They consume write overhead but are never used for reads
-- Cluster slowly accumulates dead weight every failed reindex
-- (the anti-pattern is NOT running the cleanup query)
Correct: detect and drop invalid indexes as a routine check
-- GOOD - clean up failed rebuild artifacts
SELECT 'DROP INDEX CONCURRENTLY ' || indexrelid::regclass::text || ';' AS cleanup_sql
FROM pg_index WHERE NOT indisvalid;
-- Review, then execute the generated statements.
Wrong: DROP INDEX + CREATE INDEX (non-concurrent) as a "fast rebuild"
-- BAD - between DROP and CREATE, queries do seq scans
-- On a large table, this can bring an application to its knees
DROP INDEX public.orders_customer_id_idx;
CREATE INDEX public.orders_customer_id_idx ON public.orders (customer_id);
Correct: REINDEX CONCURRENTLY keeps the index available throughout
-- GOOD - old index serves queries while new one builds
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- Atomic swap at the end; no seq-scan gap.
Common Pitfalls
- Ignoring the xmin horizon impact:
REINDEX CONCURRENTLYholds the xmin horizon cluster-wide for its entire runtime. On a 4-hour reindex of a 500GB index, autovacuum cannot clean dead tuples on ANY table for 4 hours, causing cascading bloat. Fix:pg_repackfor very large indexes; it does not hold xmin. [src4] - Running pg_repack on a table without a PK:
pg_repacksilently fails or errors with "relation does not have a primary key". Fix: add a primary key, use a unique not-null index, or fall back toREINDEX CONCURRENTLY. [src5] - Disk space exhaustion mid-rebuild: Both
REINDEX CONCURRENTLYandpg_repackneed old and new index simultaneously (effectively 2x space). Fix: pre-check withSELECT pg_size_pretty(sum(pg_relation_size(indexrelid))) FROM pg_stat_user_indexes WHERE relid = 'mytable'::regclass;and compare todf. [src5] - Fillfactor change doesn't apply retroactively:
ALTER TABLE SET (FILLFACTOR = 80)only affects new pages. Existing pages remain at 100% full untilVACUUM FULLorpg_repackrewrites them. Fix: change fillfactor THEN rebuild. [src7] - Autovacuum blocked by long-running transactions: An
idle in transactionsession from 3 hours ago blocks autovacuum on every table in the cluster. Bloat accumulates silently. Fix:SET idle_in_transaction_session_timeout = '10min'globally. [src4] - Missing
CONCURRENTLYon DROP: Dropping an invalid index with plainDROP INDEXtakesACCESS EXCLUSIVEbriefly. UseDROP INDEX CONCURRENTLYin production. [src1] - HOT updates defeated by too many indexes: HOT requires the UPDATE to not touch ANY indexed column. A table with 10 indexes on frequently-updated columns will have HOT ratio near 0%. Fix: drop low-value indexes (check
idx_scan = 0inpg_stat_user_indexes). [src7] - Using btree where BRIN would suffice: On append-only time-series tables, btree indexes on timestamp columns bloat 1000x more than BRIN. Fix:
CREATE INDEX USING BRINfor sequential-access columns. [src7]
Diagnostic Commands
-- === Identify bloat candidates ===
-- Indexes larger than their tables (strong bloat signal)
SELECT schemaname, relname,
pg_size_pretty(pg_relation_size(indexrelid)) AS idx_sz,
pg_size_pretty(pg_relation_size(relid)) AS tbl_sz
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > pg_relation_size(relid)
ORDER BY pg_relation_size(indexrelid) DESC;
-- Precise bloat via pgstattuple (requires extension)
SELECT * FROM pgstatindex('schema.index_name');
-- Fast approximate bloat (PG 9.5+)
SELECT * FROM pgstattuple_approx('schema.index_name');
-- === Invalid index cleanup ===
SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid;
-- === Monitor ongoing REINDEX progress (PG 12+) ===
SELECT pid, phase, blocks_total, blocks_done,
round(100.0 * blocks_done / NULLIF(blocks_total,0), 1) AS pct
FROM pg_stat_progress_create_index;
-- === Unused indexes (candidates to drop) ===
SELECT schemaname, relname, indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND pg_relation_size(indexrelid) > 10 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC;
-- === HOT update ratio per table ===
SELECT relname, n_tup_upd, n_tup_hot_upd,
round(n_tup_hot_upd::numeric / NULLIF(n_tup_upd,0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY n_tup_upd DESC;
-- === Check autovacuum health ===
SELECT relname, last_autovacuum, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup,0) * 100, 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;
Version History & Compatibility
| Feature | Available Since | Notes |
|---|---|---|
REINDEX (non-concurrent) | PostgreSQL 7.x | Takes ACCESS EXCLUSIVE on index |
pgstattuple extension | PostgreSQL 8.2 | Exact but slow bloat measurement |
pg_repack | PostgreSQL 9.4+ | External extension; no xmin horizon impact |
pgstattuple_approx | PostgreSQL 9.5 | Fast sample-based estimate |
REINDEX CONCURRENTLY | PostgreSQL 12 | Zero-downtime rebuild |
pg_stat_progress_create_index | PostgreSQL 12 | Monitor reindex progress |
| Parallel index builds | PostgreSQL 11 | max_parallel_maintenance_workers |
| REINDEX CONCURRENTLY unique-index bug fix | PostgreSQL 14.0 | Critical - avoid on PG 12-13 for unique indexes without patches |
| BRIN + HOT-update compatibility | PostgreSQL 16 | BRIN indexes no longer block HOT |
REINDEX ... TABLESPACE | PostgreSQL 14 | Move index during rebuild |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Single index bloated, PG 12+, live system | PG 11 or older | CREATE INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY + rename |
| Table + all indexes bloated, table has PK | Table has no PK or unique index | REINDEX TABLE CONCURRENTLY |
| Very large index, cannot hold xmin horizon | Short-lived small index | REINDEX INDEX CONCURRENTLY (simpler) |
| Slow queries + small index size | Index size is normal | See postgresql-slow-queries/2026 (not bloat) |
| Table bloat (not just index bloat) | Only indexes are bloated | pg_repack --table or VACUUM tuning |
| Append-only time-series data | High-cardinality equality lookups | BRIN index instead of btree |
Important Caveats
- REINDEX CONCURRENTLY in PG 12-13 had a unique-index corruption bug (fixed in 14.0). If stuck on 12 or 13, apply the latest minor-version patches before using it on unique indexes, or fall back to
pg_repack. pg_repackclient and extension versions must match exactly. Version drift causes silent failures. Pin both in deployment automation.- PostgreSQL managed services have limits: AWS RDS, GCP Cloud SQL, and Azure may not allow
pg_repackextension installation.REINDEX CONCURRENTLYis available everywhere PG 12+ is. - Exclusion-constraint indexes skip concurrent reindex silently. Check with
\d+ tablenameforEXCLUDE USING ...constraints before relying onREINDEX TABLE CONCURRENTLY. - Bloat below 20% is usually not worth fixing - the cost of rebuild often exceeds the recovered disk space. AWS suggests 20% as a practical threshold; some teams use 30-40%.
- Partitioned tables:
REINDEX TABLE CONCURRENTLY parentprocesses each partition sequentially, and cannot run inside a transaction. For massive partitioned tables, script per-partition rebuilds with pauses.