pgstattuple or bloat-estimator SQL and fixed with REINDEX INDEX CONCURRENTLY (PG 12+) for zero-downtime rebuilds, or pg_repack when you need to reorganize both table and indexes. Prevent recurrence with fillfactor tuning, HOT updates, and aggressive autovacuum settings.REINDEX INDEX CONCURRENTLY idx_name; - rebuilds the index without blocking writes. Fall back to pg_repack --index idx_name for very large indexes or when xmin horizon concerns block reindex.REINDEX CONCURRENTLY leaves behind INVALID indexes with suffixes _ccnew or _ccold that still consume write overhead but are never used for reads. You must DROP INDEX CONCURRENTLY them before retrying.REINDEX CONCURRENTLY added in PG 12; corruption bug fixed in PG 14.0 (do not use on PG 12-13 for unique indexes without latest minor-version patches). pg_repack works on PG 9.4+.CREATE INDEX CONCURRENTLY new_idx ...; DROP INDEX CONCURRENTLY old_idx; ALTER INDEX new_idx RENAME TO old_idx;.REINDEX or be restructured first. [src5]pg_database_size() before starting on a full disk. [src5]VACUUM FULL on production during business hours - it takes ACCESS EXCLUSIVE for the entire operation, blocking both reads and writes. Use pg_repack or REINDEX CONCURRENTLY instead.REINDEX CONCURRENTLY holds the xmin horizon for its duration - autovacuum cannot clean dead tuples in ANY table cluster-wide until it finishes. Batch one index at a time on high-churn systems. [src4]| # | Symptom / Goal | Command / Solution | Notes |
|---|---|---|---|
| 1 | Check index sizes vs table sizes | SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC; | Index larger than table = likely bloat |
| 2 | Measure bloat precisely (slow but exact) | CREATE EXTENSION pgstattuple; SELECT * FROM pgstatindex('idx_name'); | Full index scan; avoid on huge indexes |
| 3 | Measure bloat approximately (fast) | SELECT * FROM pgstattuple_approx('idx_name'); | Sample-based; use for large indexes |
| 4 | Rebuild one index online (PG 12+) | REINDEX INDEX CONCURRENTLY idx_name; | No write lock; ~2x slower than REINDEX |
| 5 | Rebuild all indexes on a table online | REINDEX TABLE CONCURRENTLY my_table; | Sequential per index; takes hours on large tables |
| 6 | Rebuild with pg_repack (alternative) | pg_repack -d mydb --index idx_name | Works on older PG; no xmin horizon impact |
| 7 | Find invalid indexes from failed rebuild | SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid; | Look for _ccnew / _ccold suffixes |
| 8 | Drop invalid index | DROP INDEX CONCURRENTLY idx_name_ccnew; | Must do before retrying REINDEX |
| 9 | Unused indexes (candidates to drop) | SELECT indexrelname FROM pg_stat_user_indexes WHERE idx_scan = 0; | Stats reset on restart - check uptime |
| 10 | Set fillfactor for write-heavy table | ALTER TABLE t SET (FILLFACTOR = 80); | Enables HOT updates; reduces index bloat 60%+ |
| 11 | Check HOT update ratio | SELECT relname, n_tup_hot_upd::float / NULLIF(n_tup_upd,0) FROM pg_stat_user_tables; | Goal: > 0.8 on frequently-updated tables |
| 12 | Aggressive autovacuum on hot table | ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0.01); | Vacuums at 1% dead instead of 20% default |
| 13 | Rebuild with pg_repack (table + indexes) | pg_repack -d mydb --table my_table -k | Requires PK; 2x disk space |
| 14 | Pre-check disk space before pg_repack | SELECT pg_size_pretty(pg_total_relation_size('t')); | Need 2x this free |
| 15 | BRIN alternative for time-series | CREATE INDEX CONCURRENTLY idx_brin ON t USING BRIN (created_at); | BRIN barely bloats; 1000x smaller than btree |
START - Suspected index bloat
|-- Step 1: CONFIRM it is index bloat
| |-- Index size > table size (pg_relation_size) -> likely bloat
| |-- pgstattuple: leaf_fragmentation > 20% or avg_leaf_density < 50% -> confirmed
| |-- Disk growing but row count flat -> bloat suspected
| `-- If no -> use postgresql-slow-queries/2026 or VACUUM for table bloat
|
|-- Step 2: CHOOSE remediation
| |-- Running PostgreSQL 12+?
| | |-- Single bloated index < 50GB -> REINDEX INDEX CONCURRENTLY
| | |-- Multiple indexes on one table -> REINDEX TABLE CONCURRENTLY
| | |-- Table + all indexes bloated; table has PK -> pg_repack
| | `-- Cannot tolerate xmin horizon hold -> pg_repack
| |-- Running PostgreSQL 11 or older?
| | `-- CREATE INDEX CONCURRENTLY new; DROP INDEX CONCURRENTLY old; RENAME
| `-- Maintenance window + small index?
| `-- REINDEX INDEX idx_name (blocks writes but ~2x faster)
|
|-- Step 3: VERIFY fix
| |-- Re-run pgstatindex -> leaf_fragmentation should be < 10%
| |-- Check pg_relation_size -> should drop significantly
| `-- Check for invalid indexes -> indisvalid must be true for all
|
`-- Step 4: PREVENT recurrence
|-- Write-heavy table? -> ALTER TABLE SET (FILLFACTOR = 80) + re-REINDEX
|-- HOT ratio < 80%? -> Reduce indexes on updated columns
|-- Append-only table with range queries? -> Replace btree with BRIN
|-- Time-series data? -> Partition by date + drop old partitions
`-- Tune autovacuum_vacuum_scale_factor = 0.01 on high-churn tables
Start with size comparison - indexes larger than their tables are a strong signal. [src3, src8]
-- Indexes ordered by size, with table-relative ratio
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
round(pg_relation_size(indexrelid)::numeric
/ NULLIF(pg_relation_size(relid), 0), 2) AS idx_to_table_ratio,
idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
-- Indexes with idx_to_table_ratio > 0.5 warrant a closer look.
Verify: Any index with ratio > 0.5 AND > 1 GB is a strong bloat candidate.
For confirmed bloat analysis, pgstattuple is the authoritative source. [src2]
-- One-time: install the extension (superuser)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
-- Exact measurement (slow on large indexes - full scan)
SELECT * FROM pgstatindex('public.orders_customer_id_idx');
-- Key columns:
-- avg_leaf_density : lower = more bloat (< 50% is bad, < 30% is critical)
-- leaf_fragmentation: higher = more bloat (> 20% warrants reindex)
-- Approximate measurement (fast, sample-based - PG 9.5+)
SELECT * FROM pgstattuple_approx('public.orders_customer_id_idx');
-- Use on multi-GB indexes to avoid table scans in production.
Verify: avg_leaf_density < 50 or leaf_fragmentation > 20 confirms bloat requiring remediation.
Before rebuilding, clear debris from prior concurrent attempts. [src1]
-- Find any invalid indexes (failed REINDEX CONCURRENTLY artifacts)
SELECT
c.relname AS index_name,
pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
WHERE NOT i.indisvalid;
-- Drop them (use CONCURRENTLY to avoid locks)
DROP INDEX CONCURRENTLY IF EXISTS public.orders_customer_id_idx_ccnew;
DROP INDEX CONCURRENTLY IF EXISTS public.orders_customer_id_idx_ccold;
Verify: Re-run the invalid-index query - should return zero rows.
This is the zero-downtime remediation for PG 12+. [src1, src6]
-- Single bloated index (no write lock; reads and writes continue)
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- All indexes on a table (processed sequentially)
REINDEX TABLE CONCURRENTLY public.orders;
-- Parallel workers (PG 14+): control via max_parallel_maintenance_workers
SET max_parallel_maintenance_workers = 4;
REINDEX INDEX CONCURRENTLY public.large_idx;
Verify: SELECT pg_size_pretty(pg_relation_size('public.orders_customer_id_idx')); - size should drop significantly (often 50-90% reduction on heavily bloated indexes).
When the index is very large, the table is also bloated, or you cannot hold the xmin horizon. [src5]
# Install extension once (requires superuser)
psql -d mydb -c "CREATE EXTENSION pg_repack;"
# Rebuild all indexes on a table (table stays writable)
pg_repack -h db.example.com -d mydb --table public.orders -k
# Rebuild only one index
pg_repack -h db.example.com -d mydb --index public.orders_customer_id_idx
# Dry run to verify before touching production
pg_repack -h db.example.com -d mydb --table public.orders --dry-run
Verify: After completion, pg_repack reports "ok" and removes all intermediate objects. Check pg_stat_activity for stale sessions.
For write-heavy tables, lower fillfactor enables HOT updates, which avoid updating indexes entirely. [src7]
-- Set fillfactor to 80% - leaves room for HOT updates on same page
ALTER TABLE public.orders SET (FILLFACTOR = 80);
-- Rebuild table + indexes to apply fillfactor to existing pages
VACUUM FULL public.orders; -- OR: pg_repack -t public.orders
-- Verify HOT update ratio is climbing
SELECT
relname,
n_tup_upd,
n_tup_hot_upd,
round(n_tup_hot_upd::numeric / NULLIF(n_tup_upd, 0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE relname = 'orders';
-- Goal: hot_pct > 80 on heavily-updated tables.
Verify: hot_pct improves within hours of normal workload. Index bloat accumulation rate drops proportionally.
Default autovacuum triggers at 20% dead tuples - too late for high-churn tables. [src4]
-- Per-table autovacuum settings (more aggressive)
ALTER TABLE public.orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% dead vs 20%
autovacuum_vacuum_threshold = 1000, -- minimum dead tuples
autovacuum_analyze_scale_factor = 0.005, -- analyze at 0.5%
autovacuum_vacuum_cost_limit = 2000 -- allow more work per cycle
);
-- Kill idle-in-transaction sessions that block autovacuum
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND now() - state_change > INTERVAL '10 minutes';
Verify: pg_stat_user_tables.last_autovacuum should update frequently; n_dead_tup should stay low.
-- Input: pgstattuple extension installed
-- Output: Ranked list of bloated indexes with precise metrics
CREATE EXTENSION IF NOT EXISTS pgstattuple;
WITH idx AS (
SELECT schemaname, indexrelname, indexrelid,
pg_relation_size(indexrelid) AS idx_bytes
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024 -- >100MB only
)
SELECT schemaname, indexrelname,
pg_size_pretty(idx_bytes) AS size,
(pgstatindex(indexrelid::regclass::text)).avg_leaf_density AS density,
(pgstatindex(indexrelid::regclass::text)).leaf_fragmentation AS frag
FROM idx
ORDER BY idx_bytes DESC;
# Input: connection string + bloat threshold (default: leaf_density < 50)
# Output: reindexes each qualifying index; reports status
import psycopg2, time
conn = psycopg2.connect(CONN_STR)
conn.autocommit = True # REINDEX CONCURRENTLY cannot be in a transaction
with conn.cursor() as cur:
cur.execute("""
SELECT schemaname||'.'||indexrelname AS idx
FROM pg_stat_user_indexes i
WHERE pg_relation_size(indexrelid) > 100*1024*1024
AND (pgstatindex(indexrelid::regclass::text)).avg_leaf_density < 50
""")
for (idx,) in cur.fetchall():
cur.execute(f"REINDEX INDEX CONCURRENTLY {idx}")
print(f"reindexed {idx}")
time.sleep(5) # brief pause to let autovacuum catch up
#!/bin/bash
# Input: DB name + table name
# Output: pg_repack runs only if 2x free disk space available
DB="$1"; TABLE="$2"
NEEDED=$(psql -tA -d "$DB" -c "SELECT pg_total_relation_size('$TABLE') * 2")
FREE=$(df -B1 --output=avail /var/lib/postgresql | tail -1)
if [ "$FREE" -lt "$NEEDED" ]; then
echo "ERROR: need $NEEDED bytes, have $FREE" >&2
exit 1
fi
pg_repack -d "$DB" --table "$TABLE" -k --jobs=4
-- BAD - takes ACCESS EXCLUSIVE lock on the entire table
-- Blocks ALL reads and writes for minutes to hours
VACUUM FULL public.orders;
-- Queries pile up; connection pool fills; cascading outage
-- GOOD - no write lock; production continues normally
REINDEX TABLE CONCURRENTLY public.orders;
-- Or for table + index bloat together:
-- pg_repack -d mydb --table public.orders -k
-- BAD - takes ACCESS EXCLUSIVE on the index
-- All queries using this index are blocked for the duration
REINDEX INDEX public.orders_customer_id_idx;
-- GOOD - reads + writes continue; only ~2x slower
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- Monitor with: SELECT * FROM pg_stat_progress_create_index;
-- BAD - orphan _ccnew / _ccold indexes persist
-- They consume write overhead but are never used for reads
-- Cluster slowly accumulates dead weight every failed reindex
-- (the anti-pattern is NOT running the cleanup query)
-- GOOD - clean up failed rebuild artifacts
SELECT 'DROP INDEX CONCURRENTLY ' || indexrelid::regclass::text || ';' AS cleanup_sql
FROM pg_index WHERE NOT indisvalid;
-- Review, then execute the generated statements.
-- BAD - between DROP and CREATE, queries do seq scans
-- On a large table, this can bring an application to its knees
DROP INDEX public.orders_customer_id_idx;
CREATE INDEX public.orders_customer_id_idx ON public.orders (customer_id);
-- GOOD - old index serves queries while new one builds
REINDEX INDEX CONCURRENTLY public.orders_customer_id_idx;
-- Atomic swap at the end; no seq-scan gap.
REINDEX CONCURRENTLY holds the xmin horizon cluster-wide for its entire runtime. On a 4-hour reindex of a 500GB index, autovacuum cannot clean dead tuples on ANY table for 4 hours, causing cascading bloat. Fix: pg_repack for very large indexes; it does not hold xmin. [src4]pg_repack silently fails or errors with "relation does not have a primary key". Fix: add a primary key, use a unique not-null index, or fall back to REINDEX CONCURRENTLY. [src5]REINDEX CONCURRENTLY and pg_repack need old and new index simultaneously (effectively 2x space). Fix: pre-check with SELECT pg_size_pretty(sum(pg_relation_size(indexrelid))) FROM pg_stat_user_indexes WHERE relid = 'mytable'::regclass; and compare to df. [src5]ALTER TABLE SET (FILLFACTOR = 80) only affects new pages. Existing pages remain at 100% full until VACUUM FULL or pg_repack rewrites them. Fix: change fillfactor THEN rebuild. [src7]idle in transaction session from 3 hours ago blocks autovacuum on every table in the cluster. Bloat accumulates silently. Fix: SET idle_in_transaction_session_timeout = '10min' globally. [src4]CONCURRENTLY on DROP: Dropping an invalid index with plain DROP INDEX takes ACCESS EXCLUSIVE briefly. Use DROP INDEX CONCURRENTLY in production. [src1]idx_scan = 0 in pg_stat_user_indexes). [src7]CREATE INDEX USING BRIN for sequential-access columns. [src7]-- === Identify bloat candidates ===
-- Indexes larger than their tables (strong bloat signal)
SELECT schemaname, relname,
pg_size_pretty(pg_relation_size(indexrelid)) AS idx_sz,
pg_size_pretty(pg_relation_size(relid)) AS tbl_sz
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > pg_relation_size(relid)
ORDER BY pg_relation_size(indexrelid) DESC;
-- Precise bloat via pgstattuple (requires extension)
SELECT * FROM pgstatindex('schema.index_name');
-- Fast approximate bloat (PG 9.5+)
SELECT * FROM pgstattuple_approx('schema.index_name');
-- === Invalid index cleanup ===
SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid;
-- === Monitor ongoing REINDEX progress (PG 12+) ===
SELECT pid, phase, blocks_total, blocks_done,
round(100.0 * blocks_done / NULLIF(blocks_total,0), 1) AS pct
FROM pg_stat_progress_create_index;
-- === Unused indexes (candidates to drop) ===
SELECT schemaname, relname, indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND pg_relation_size(indexrelid) > 10 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC;
-- === HOT update ratio per table ===
SELECT relname, n_tup_upd, n_tup_hot_upd,
round(n_tup_hot_upd::numeric / NULLIF(n_tup_upd,0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY n_tup_upd DESC;
-- === Check autovacuum health ===
SELECT relname, last_autovacuum, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup,0) * 100, 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;
| Feature | Available Since | Notes |
|---|---|---|
REINDEX (non-concurrent) | PostgreSQL 7.x | Takes ACCESS EXCLUSIVE on index |
pgstattuple extension | PostgreSQL 8.2 | Exact but slow bloat measurement |
pg_repack | PostgreSQL 9.4+ | External extension; no xmin horizon impact |
pgstattuple_approx | PostgreSQL 9.5 | Fast sample-based estimate |
REINDEX CONCURRENTLY | PostgreSQL 12 | Zero-downtime rebuild |
pg_stat_progress_create_index | PostgreSQL 12 | Monitor reindex progress |
| Parallel index builds | PostgreSQL 11 | max_parallel_maintenance_workers |
| REINDEX CONCURRENTLY unique-index bug fix | PostgreSQL 14.0 | Critical - avoid on PG 12-13 for unique indexes without patches |
| BRIN + HOT-update compatibility | PostgreSQL 16 | BRIN indexes no longer block HOT |
REINDEX ... TABLESPACE | PostgreSQL 14 | Move index during rebuild |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Single index bloated, PG 12+, live system | PG 11 or older | CREATE INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY + rename |
| Table + all indexes bloated, table has PK | Table has no PK or unique index | REINDEX TABLE CONCURRENTLY |
| Very large index, cannot hold xmin horizon | Short-lived small index | REINDEX INDEX CONCURRENTLY (simpler) |
| Slow queries + small index size | Index size is normal | See postgresql-slow-queries/2026 (not bloat) |
| Table bloat (not just index bloat) | Only indexes are bloated | pg_repack --table or VACUUM tuning |
| Append-only time-series data | High-cardinality equality lookups | BRIN index instead of btree |
pg_repack.pg_repack client and extension versions must match exactly. Version drift causes silent failures. Pin both in deployment automation.pg_repack extension installation. REINDEX CONCURRENTLY is available everywhere PG 12+ is.\d+ tablename for EXCLUDE USING ... constraints before relying on REINDEX TABLE CONCURRENTLY.REINDEX TABLE CONCURRENTLY parent processes each partition sequentially, and cannot run inside a transaction. For massive partitioned tables, script per-partition rebuilds with pauses.