Database Indexing Strategies

Type: Software Reference Confidence: 0.90 Sources: 7 Verified: 2026-02-24 Freshness: 2026-02-24

TL;DR

Constraints

Quick Reference

Index TypeBest ForLookup SpeedWrite OverheadStorageEngine Support
B-treeRange queries, sorting, equalityO(log n)ModerateModerateAll engines
HashEquality-only lookupsO(1) amortizedLowLowPG 10+, MySQL (Memory only)
GINJSONB, arrays, full-text, trgmO(1) per elementHigh (batch-friendly)HighPostgreSQL only
GiSTGeospatial, range types, nearest-neighborO(log n)ModerateModeratePostgreSQL only
SP-GiSTNon-balanced structures (quad-trees, k-d trees)O(log n)ModerateModeratePostgreSQL only
BRINTime-series, append-only sorted dataO(n) worst caseVery lowVery lowPostgreSQL 9.5+
Partial IndexFiltered subsets (active rows, non-null)Same as base typeLower (fewer rows)LowerPG, SQL Server (filtered), SQLite
Covering (INCLUDE)Index-only scans avoiding heap lookupsO(log n)Moderate-highHigherPG 11+, SQL Server, MySQL 8.0+
Composite IndexMulti-column filters and sortsO(log n)ModerateModerateAll engines
ColumnstoreAnalytics, aggregation on large tablesScan-optimizedHigh (batch insert)Very high (compressed)SQL Server 2012+
Expression/FunctionalQueries on computed values (LOWER(col))O(log n)ModerateModeratePG, MySQL 8.0+

Decision Tree

START
├── Query uses only equality (=)?
│   ├── YES → Single column?
│   │   ├── YES → Hash index (PG) or B-tree (MySQL/SQL Server)
│   │   └── NO → Composite B-tree index (columns in selectivity order)
│   └── NO ↓
├── Query uses range (>, <, BETWEEN) or ORDER BY?
│   ├── YES → B-tree index (range column last in composite)
│   └── NO ↓
├── Column is JSONB, array, or tsvector?
│   ├── YES → GIN index (PostgreSQL)
│   └── NO ↓
├── Column is geometry/geography or range type?
│   ├── YES → GiST index (PostgreSQL) or spatial index (MySQL/SQL Server)
│   └── NO ↓
├── Data is append-only / time-series with physical sort order?
│   ├── YES → BRIN index (PostgreSQL) — 1000x smaller than B-tree
│   └── NO ↓
├── Only a small fraction of rows match your filter?
│   ├── YES → Partial index (WHERE active = true)
│   └── NO ↓
├── Query reads many columns but only filters on few?
│   ├── YES → Covering index (INCLUDE extra columns)
│   └── NO ↓
├── Workload is analytical (aggregations on millions of rows)?
│   ├── YES → Columnstore index (SQL Server)
│   └── NO ↓
└── DEFAULT → B-tree index on the filtered/sorted column(s)

Step-by-Step Guide

1. Identify slow queries with EXPLAIN

Run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see if the query uses a sequential scan instead of an index scan. [src1]

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42;

-- MySQL
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- SQL Server
SET STATISTICS IO ON;
SELECT * FROM orders WHERE customer_id = 42;

Verify: Look for Seq Scan (PG), ALL in type column (MySQL), or Table Scan (SQL Server) — these indicate missing indexes.

2. Create a B-tree index on filtered columns

Start with a standard B-tree index on the column(s) appearing in WHERE clauses. [src2]

-- PostgreSQL / MySQL
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- SQL Server
CREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);

Verify: Re-run EXPLAIN — should now show Index Scan (PG), ref (MySQL), or Index Seek (SQL Server).

3. Add composite indexes for multi-column queries

For queries filtering on multiple columns, create a composite index with equality columns first, range columns last. [src3]

-- Query: WHERE status = 'active' AND created_at > '2026-01-01'
-- Equality columns first, range columns last
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

Verify: EXPLAIN should show the composite index being used with both conditions.

4. Use partial indexes to reduce index size

If queries consistently filter on a condition, create a partial index. [src2]

-- Only 5% of orders are 'pending' — index just those
CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';

Verify: SELECT pg_size_pretty(pg_relation_size('idx_orders_pending')); — should be much smaller than a full index.

5. Create covering indexes to avoid table lookups

Include extra columns so the query can be answered entirely from the index. [src4]

-- PostgreSQL 11+
CREATE INDEX idx_orders_cover ON orders (customer_id)
  INCLUDE (total, created_at);

-- SQL Server
CREATE NONCLUSTERED INDEX idx_orders_cover ON orders (customer_id)
  INCLUDE (total, created_at);

-- MySQL (covering via composite — no INCLUDE keyword)
CREATE INDEX idx_orders_cover ON orders (customer_id, total, created_at);

Verify: EXPLAIN shows Index Only Scan (PG) or Covering Index Scan (SQL Server).

6. Monitor and remove unused indexes

Indexes that are never used waste storage and slow writes. Query system catalogs regularly. [src1]

-- PostgreSQL: find unused indexes
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes
DROP INDEX idx_unused_index;

Verify: After dropping, monitor write performance — INSERT throughput should improve.

Code Examples

PostgreSQL: GIN Index for JSONB Queries

-- Input:  Table with JSONB column storing product attributes
-- Output: Fast lookups on any key inside the JSONB document

-- Create a GIN index for containment queries (@>)
CREATE INDEX idx_products_attrs ON products USING gin (attributes);

-- Query: find products where attributes contain {"color": "red"}
SELECT id, name FROM products
WHERE attributes @> '{"color": "red"}'::jsonb;

-- For specific key lookups, use jsonb_path_ops (smaller, faster)
CREATE INDEX idx_products_attrs_path ON products
  USING gin (attributes jsonb_path_ops);

PostgreSQL: BRIN Index for Time-Series Data

-- Input:  Append-only events table with timestamp column
-- Output: Extremely small index for range queries on time

-- BRIN stores min/max per block range (128 pages default)
CREATE INDEX idx_events_ts ON events USING brin (created_at);

-- Query: filter by time range — BRIN eliminates block ranges
SELECT * FROM events
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31';

-- Check physical correlation (must be > 0.9 for BRIN to help)
SELECT correlation FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';

MySQL: Composite Index with Sort Optimization

-- Input:  Orders table queried by status + date range + sort
-- Output: Single index serving filter, range, and ORDER BY

-- Equality columns first, then range, then sort direction
CREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);

-- This query uses the index for all three operations:
SELECT id, total, created_at FROM orders
WHERE status = 'shipped'
  AND created_at > '2026-01-01'
ORDER BY created_at DESC
LIMIT 20;

SQL Server: Filtered Index + Columnstore for Mixed Workloads

-- Input:  Large orders table with OLTP + reporting queries
-- Output: Filtered rowstore index for OLTP, columnstore for analytics

-- Filtered index: only active orders (OLTP fast path)
CREATE NONCLUSTERED INDEX idx_orders_active
ON orders (customer_id, created_at)
WHERE status = 'active';

-- Nonclustered columnstore: analytics on historical data
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_orders_analytics
ON orders (customer_id, total, product_id, created_at)
WHERE status != 'active';

Anti-Patterns

Wrong: Indexing every column individually

-- BAD — creates N separate indexes that mostly go unused
-- The optimizer typically picks only ONE index per table scan
CREATE INDEX idx_col1 ON orders (customer_id);
CREATE INDEX idx_col2 ON orders (status);
CREATE INDEX idx_col3 ON orders (created_at);
CREATE INDEX idx_col4 ON orders (total);
CREATE INDEX idx_col5 ON orders (product_id);

Correct: Create targeted composite indexes matching query patterns

-- GOOD — one composite index serves the most common query
CREATE INDEX idx_orders_main ON orders (customer_id, status, created_at);
-- Serves: WHERE customer_id = ?, WHERE customer_id = ? AND status = ?,
-- and WHERE customer_id = ? AND status = ? AND created_at > ?

Wrong: Range column before equality column in composite index

-- BAD — range column first breaks index efficiency for equality filter
CREATE INDEX idx_orders_bad ON orders (created_at, status);

-- This query cannot efficiently use the index for status filtering:
SELECT * FROM orders WHERE status = 'active' AND created_at > '2026-01-01';

Correct: Equality columns first, range columns last

-- GOOD — equality columns narrow the search before range scan
CREATE INDEX idx_orders_good ON orders (status, created_at);

-- Now the index seeks to status='active' then scans the date range
SELECT * FROM orders WHERE status = 'active' AND created_at > '2026-01-01';

Wrong: Using functions on indexed columns

-- BAD — wrapping the column in a function prevents index usage
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
SELECT * FROM orders WHERE YEAR(created_at) = 2026;

Correct: Use expression indexes or rewrite the query

-- GOOD (PostgreSQL) — expression index on the function result
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- GOOD (MySQL) — rewrite to avoid function on column
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

Wrong: Indexing low-cardinality boolean columns alone

-- BAD — a boolean column has only 2 values, full index has no selectivity
CREATE INDEX idx_users_active ON users (is_active);
-- The optimizer may ignore this index entirely (table scan is cheaper)

Correct: Use a partial index on the rare value

-- GOOD — partial index on the minority value is tiny and targeted
CREATE INDEX idx_users_active ON users (id) WHERE is_active = true;
-- Only indexes the ~5% of active users, not the entire table

Wrong: Never checking if indexes are actually used

-- BAD — creating indexes without monitoring leads to index bloat
-- Common in ORMs that auto-generate indexes for every foreign key

Correct: Regularly audit index usage and drop dead indexes

-- GOOD — query pg_stat_user_indexes to find unused indexes
SELECT indexrelname AS index_name,
       idx_scan AS times_used,
       pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Common Pitfalls

Diagnostic Commands

-- PostgreSQL: check index usage statistics
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- PostgreSQL: check table index sizes
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes WHERE tablename = 'your_table';

-- PostgreSQL: check physical correlation for BRIN suitability
SELECT attname, correlation
FROM pg_stats WHERE tablename = 'your_table';

-- PostgreSQL: find duplicate indexes
SELECT indrelid::regclass AS table_name,
       array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;

-- MySQL: check index cardinality and usage
SHOW INDEX FROM your_table;
SELECT * FROM sys.schema_index_statistics WHERE table_name = 'your_table';

-- MySQL: find redundant indexes
SELECT * FROM sys.schema_redundant_indexes;

-- SQL Server: find missing indexes suggested by the optimizer
SELECT mid.statement, mid.equality_columns, mid.inequality_columns,
       mid.included_columns, migs.avg_user_impact
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
ORDER BY migs.avg_user_impact DESC;

-- SQL Server: find unused indexes
SELECT OBJECT_NAME(i.object_id) AS table_name, i.name AS index_name,
       ius.user_seeks, ius.user_scans, ius.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats ius
  ON i.object_id = ius.object_id AND i.index_id = ius.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
  AND (ius.user_seeks + ius.user_scans) = 0
ORDER BY ius.user_updates DESC;

Version History & Compatibility

FeaturePostgreSQLMySQLSQL ServerNotes
B-treeAll versionsAll versionsAll versionsUniversal default
Hash index10+ (WAL-logged)Memory engine onlyN/APG hash crash-safe since v10
GIN8.2+N/AN/APostgreSQL-specific
GiST7.0+N/AN/APostgreSQL-specific
BRIN9.5+N/AN/APostgreSQL-specific
Partial/FilteredAll versionsN/A2008+MySQL lacks partial indexes
Covering (INCLUDE)11+8.0+ (implicit)2005+MySQL uses composite for covering
Expression/FunctionalAll versions8.0+Via computed columnsMySQL added in 8.0
ColumnstoreN/AN/A2012+SQL Server-specific

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Queries filter on specific columns repeatedlyTable has < 1000 rowsFull table scan (faster for tiny tables)
Read/query workload dominatesWrite throughput is the bottleneckBatch inserts, then add indexes after bulk load
Query needs sorted output (ORDER BY)Data is write-once, read-never (audit logs)Append-only table, no indexes
Covering index eliminates heap lookupsYou're indexing every column "just in case"Targeted indexes based on actual query patterns
JSONB containment queries are slowJSON values are small and rarely queriedNo index; sequential scan on small data
Time-series data with range filtersData is inserted in random orderB-tree (BRIN requires physical correlation)

Important Caveats

Related Units