How to create database indexes

- Bottom line: Choose index types based on your query patterns — B-tree for range/sort queries, hash for equality-only, GIN for JSONB/arrays, BRIN for time-series, and partial indexes to reduce storage on filtered workloads.

Which index type should I use

- Bottom line: Choose index types based on your query patterns — B-tree for range/sort queries, hash for equality-only, GIN for JSONB/arrays, BRIN for time-series, and partial indexes to reduce storage on filtered workloads.

Database index performance optimization

- Bottom line: Choose index types based on your query patterns — B-tree for range/sort queries, hash for equality-only, GIN for JSONB/arrays, BRIN for time-series, and partial indexes to reduce storage on filtered workloads.

B-tree vs hash vs GIN index comparison

- Bottom line: Choose index types based on your query patterns — B-tree for range/sort queries, hash for equality-only, GIN for JSONB/arrays, BRIN for time-series, and partial indexes to reduce storage on filtered workloads.

Database Indexing Strategies

What are the best database indexing strategies?

TL;DR

Bottom line: Choose index types based on your query patterns — B-tree for range/sort queries, hash for equality-only, GIN for JSONB/arrays, BRIN for time-series, and partial indexes to reduce storage on filtered workloads.
Key tool/command: CREATE INDEX idx_name ON table (col1, col2) WHERE condition;
Watch out for: Over-indexing — every unused index wastes storage and slows writes with zero read benefit.
Works with: PostgreSQL 13+, MySQL 8.0+, SQL Server 2019+, SQLite 3.35+ (syntax varies per engine).

Constraints

Every index adds write overhead — INSERT/UPDATE/DELETE operations slow down per additional index. Benchmark write-heavy workloads before adding indexes.
Composite index column order must follow the leftmost prefix rule: a (a, b, c) index serves queries on (a), (a, b), and (a, b, c) but NOT (b, c) alone.
Hash indexes do not support range queries (>, <, BETWEEN, ORDER BY) — only exact equality (=). Use B-tree for anything else.
Never apply functions to indexed columns in WHERE clauses (WHERE UPPER(name) = 'FOO') unless you create an expression/functional index.
BRIN indexes require physically sorted (correlated) data to be effective. Random inserts destroy correlation and make BRIN useless.
Partial indexes must have a WHERE clause that matches your query's filter exactly — the planner won't use a partial index if the query's condition doesn't match.

Quick Reference

Index Type	Best For	Lookup Speed	Write Overhead	Storage	Engine Support
B-tree	Range queries, sorting, equality	O(log n)	Moderate	Moderate	All engines
Hash	Equality-only lookups	O(1) amortized	Low	Low	PG 10+, MySQL (Memory only)
GIN	JSONB, arrays, full-text, trgm	O(1) per element	High (batch-friendly)	High	PostgreSQL only
GiST	Geospatial, range types, nearest-neighbor	O(log n)	Moderate	Moderate	PostgreSQL only
SP-GiST	Non-balanced structures (quad-trees, k-d trees)	O(log n)	Moderate	Moderate	PostgreSQL only
BRIN	Time-series, append-only sorted data	O(n) worst case	Very low	Very low	PostgreSQL 9.5+
Partial Index	Filtered subsets (active rows, non-null)	Same as base type	Lower (fewer rows)	Lower	PG, SQL Server (filtered), SQLite
Covering (INCLUDE)	Index-only scans avoiding heap lookups	O(log n)	Moderate-high	Higher	PG 11+, SQL Server, MySQL 8.0+
Composite Index	Multi-column filters and sorts	O(log n)	Moderate	Moderate	All engines
Columnstore	Analytics, aggregation on large tables	Scan-optimized	High (batch insert)	Very high (compressed)	SQL Server 2012+
Expression/Functional	Queries on computed values (`LOWER(col)`)	O(log n)	Moderate	Moderate	PG, MySQL 8.0+

Decision Tree

START
├── Query uses only equality (=)?
│   ├── YES → Single column?
│   │   ├── YES → Hash index (PG) or B-tree (MySQL/SQL Server)
│   │   └── NO → Composite B-tree index (columns in selectivity order)
│   └── NO ↓
├── Query uses range (>, <, BETWEEN) or ORDER BY?
│   ├── YES → B-tree index (range column last in composite)
│   └── NO ↓
├── Column is JSONB, array, or tsvector?
│   ├── YES → GIN index (PostgreSQL)
│   └── NO ↓
├── Column is geometry/geography or range type?
│   ├── YES → GiST index (PostgreSQL) or spatial index (MySQL/SQL Server)
│   └── NO ↓
├── Data is append-only / time-series with physical sort order?
│   ├── YES → BRIN index (PostgreSQL) — 1000x smaller than B-tree
│   └── NO ↓
├── Only a small fraction of rows match your filter?
│   ├── YES → Partial index (WHERE active = true)
│   └── NO ↓
├── Query reads many columns but only filters on few?
│   ├── YES → Covering index (INCLUDE extra columns)
│   └── NO ↓
├── Workload is analytical (aggregations on millions of rows)?
│   ├── YES → Columnstore index (SQL Server)
│   └── NO ↓
└── DEFAULT → B-tree index on the filtered/sorted column(s)

Step-by-Step Guide

1. Identify slow queries with EXPLAIN

Run EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see if the query uses a sequential scan instead of an index scan. [src1]

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42;

-- MySQL
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- SQL Server
SET STATISTICS IO ON;
SELECT * FROM orders WHERE customer_id = 42;

Verify: Look for Seq Scan (PG), ALL in type column (MySQL), or Table Scan (SQL Server) — these indicate missing indexes.

2. Create a B-tree index on filtered columns

Start with a standard B-tree index on the column(s) appearing in WHERE clauses. [src2]

-- PostgreSQL / MySQL
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- SQL Server
CREATE NONCLUSTERED INDEX idx_orders_customer ON orders (customer_id);

Verify: Re-run EXPLAIN — should now show Index Scan (PG), ref (MySQL), or Index Seek (SQL Server).

3. Add composite indexes for multi-column queries

For queries filtering on multiple columns, create a composite index with equality columns first, range columns last. [src3]

-- Query: WHERE status = 'active' AND created_at > '2026-01-01'
-- Equality columns first, range columns last
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

Verify: EXPLAIN should show the composite index being used with both conditions.

4. Use partial indexes to reduce index size

If queries consistently filter on a condition, create a partial index. [src2]

-- Only 5% of orders are 'pending' — index just those
CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';

Verify: SELECT pg_size_pretty(pg_relation_size('idx_orders_pending')); — should be much smaller than a full index.

5. Create covering indexes to avoid table lookups

Include extra columns so the query can be answered entirely from the index. [src4]

-- PostgreSQL 11+
CREATE INDEX idx_orders_cover ON orders (customer_id)
  INCLUDE (total, created_at);

-- SQL Server
CREATE NONCLUSTERED INDEX idx_orders_cover ON orders (customer_id)
  INCLUDE (total, created_at);

-- MySQL (covering via composite — no INCLUDE keyword)
CREATE INDEX idx_orders_cover ON orders (customer_id, total, created_at);

Verify: EXPLAIN shows Index Only Scan (PG) or Covering Index Scan (SQL Server).

6. Monitor and remove unused indexes

Indexes that are never used waste storage and slow writes. Query system catalogs regularly. [src1]

-- PostgreSQL: find unused indexes
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes
DROP INDEX idx_unused_index;

Verify: After dropping, monitor write performance — INSERT throughput should improve.

Code Examples

PostgreSQL: GIN Index for JSONB Queries

-- Input:  Table with JSONB column storing product attributes
-- Output: Fast lookups on any key inside the JSONB document

-- Create a GIN index for containment queries (@>)
CREATE INDEX idx_products_attrs ON products USING gin (attributes);

-- Query: find products where attributes contain {"color": "red"}
SELECT id, name FROM products
WHERE attributes @> '{"color": "red"}'::jsonb;

-- For specific key lookups, use jsonb_path_ops (smaller, faster)
CREATE INDEX idx_products_attrs_path ON products
  USING gin (attributes jsonb_path_ops);

PostgreSQL: BRIN Index for Time-Series Data

-- Input:  Append-only events table with timestamp column
-- Output: Extremely small index for range queries on time

-- BRIN stores min/max per block range (128 pages default)
CREATE INDEX idx_events_ts ON events USING brin (created_at);

-- Query: filter by time range — BRIN eliminates block ranges
SELECT * FROM events
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31';

-- Check physical correlation (must be > 0.9 for BRIN to help)
SELECT correlation FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';

MySQL: Composite Index with Sort Optimization

-- Input:  Orders table queried by status + date range + sort
-- Output: Single index serving filter, range, and ORDER BY

-- Equality columns first, then range, then sort direction
CREATE INDEX idx_orders_status_date ON orders (status, created_at DESC);

-- This query uses the index for all three operations:
SELECT id, total, created_at FROM orders
WHERE status = 'shipped'
  AND created_at > '2026-01-01'
ORDER BY created_at DESC
LIMIT 20;

SQL Server: Filtered Index + Columnstore for Mixed Workloads

-- Input:  Large orders table with OLTP + reporting queries
-- Output: Filtered rowstore index for OLTP, columnstore for analytics

-- Filtered index: only active orders (OLTP fast path)
CREATE NONCLUSTERED INDEX idx_orders_active
ON orders (customer_id, created_at)
WHERE status = 'active';

-- Nonclustered columnstore: analytics on historical data
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_orders_analytics
ON orders (customer_id, total, product_id, created_at)
WHERE status != 'active';

Anti-Patterns

Wrong: Indexing every column individually

-- BAD — creates N separate indexes that mostly go unused
-- The optimizer typically picks only ONE index per table scan
CREATE INDEX idx_col1 ON orders (customer_id);
CREATE INDEX idx_col2 ON orders (status);
CREATE INDEX idx_col3 ON orders (created_at);
CREATE INDEX idx_col4 ON orders (total);
CREATE INDEX idx_col5 ON orders (product_id);

Correct: Create targeted composite indexes matching query patterns

-- GOOD — one composite index serves the most common query
CREATE INDEX idx_orders_main ON orders (customer_id, status, created_at);
-- Serves: WHERE customer_id = ?, WHERE customer_id = ? AND status = ?,
-- and WHERE customer_id = ? AND status = ? AND created_at > ?

Wrong: Range column before equality column in composite index

-- BAD — range column first breaks index efficiency for equality filter
CREATE INDEX idx_orders_bad ON orders (created_at, status);

-- This query cannot efficiently use the index for status filtering:
SELECT * FROM orders WHERE status = 'active' AND created_at > '2026-01-01';

Correct: Equality columns first, range columns last

-- GOOD — equality columns narrow the search before range scan
CREATE INDEX idx_orders_good ON orders (status, created_at);

-- Now the index seeks to status='active' then scans the date range
SELECT * FROM orders WHERE status = 'active' AND created_at > '2026-01-01';

Wrong: Using functions on indexed columns

-- BAD — wrapping the column in a function prevents index usage
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
SELECT * FROM orders WHERE YEAR(created_at) = 2026;

Correct: Use expression indexes or rewrite the query

-- GOOD (PostgreSQL) — expression index on the function result
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- GOOD (MySQL) — rewrite to avoid function on column
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

Wrong: Indexing low-cardinality boolean columns alone

-- BAD — a boolean column has only 2 values, full index has no selectivity
CREATE INDEX idx_users_active ON users (is_active);
-- The optimizer may ignore this index entirely (table scan is cheaper)

Correct: Use a partial index on the rare value

-- GOOD — partial index on the minority value is tiny and targeted
CREATE INDEX idx_users_active ON users (id) WHERE is_active = true;
-- Only indexes the ~5% of active users, not the entire table

Wrong: Never checking if indexes are actually used

-- BAD — creating indexes without monitoring leads to index bloat
-- Common in ORMs that auto-generate indexes for every foreign key

Correct: Regularly audit index usage and drop dead indexes

-- GOOD — query pg_stat_user_indexes to find unused indexes
SELECT indexrelname AS index_name,
       idx_scan AS times_used,
       pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Common Pitfalls

Index bloat in PostgreSQL: Dead tuples from UPDATE/DELETE inflate index size. Fix: REINDEX CONCURRENTLY idx_name; or tune autovacuum_vacuum_scale_factor. [src2]
Implicit type casting defeats indexes: WHERE int_column = '42' forces a cast, bypassing the index. Fix: ensure parameter types match column types. [src1]
MySQL FORCE INDEX causes plan regression: Optimizer hints become stale as data changes. Fix: remove hints, use ANALYZE TABLE to refresh statistics. [src3]
Partial index not used due to mismatched WHERE clause: The planner only uses a partial index when your query's condition logically implies the index's predicate. Fix: make conditions match. [src2]
BRIN on randomly inserted data is useless: BRIN relies on physical row order correlation. Fix: check pg_stats.correlation — value must be close to 1.0 or -1.0. [src6]
Covering index INCLUDE columns are not searchable: Columns in INCLUDE are stored in leaf pages only. Fix: put searchable columns in the key, non-searchable in INCLUDE. [src4]
Too many indexes on write-heavy tables: Each INSERT updates every index. Fix: keep indexes under 5-7 per table on write-heavy workloads. [src1]

Diagnostic Commands

-- PostgreSQL: check index usage statistics
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- PostgreSQL: check table index sizes
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes WHERE tablename = 'your_table';

-- PostgreSQL: check physical correlation for BRIN suitability
SELECT attname, correlation
FROM pg_stats WHERE tablename = 'your_table';

-- PostgreSQL: find duplicate indexes
SELECT indrelid::regclass AS table_name,
       array_agg(indexrelid::regclass) AS duplicate_indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;

-- MySQL: check index cardinality and usage
SHOW INDEX FROM your_table;
SELECT * FROM sys.schema_index_statistics WHERE table_name = 'your_table';

-- MySQL: find redundant indexes
SELECT * FROM sys.schema_redundant_indexes;

-- SQL Server: find missing indexes suggested by the optimizer
SELECT mid.statement, mid.equality_columns, mid.inequality_columns,
       mid.included_columns, migs.avg_user_impact
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
ORDER BY migs.avg_user_impact DESC;

-- SQL Server: find unused indexes
SELECT OBJECT_NAME(i.object_id) AS table_name, i.name AS index_name,
       ius.user_seeks, ius.user_scans, ius.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats ius
  ON i.object_id = ius.object_id AND i.index_id = ius.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
  AND (ius.user_seeks + ius.user_scans) = 0
ORDER BY ius.user_updates DESC;

Version History & Compatibility

Feature	PostgreSQL	MySQL	SQL Server	Notes
B-tree	All versions	All versions	All versions	Universal default
Hash index	10+ (WAL-logged)	Memory engine only	N/A	PG hash crash-safe since v10
GIN	8.2+	N/A	N/A	PostgreSQL-specific
GiST	7.0+	N/A	N/A	PostgreSQL-specific
BRIN	9.5+	N/A	N/A	PostgreSQL-specific
Partial/Filtered	All versions	N/A	2008+	MySQL lacks partial indexes
Covering (INCLUDE)	11+	8.0+ (implicit)	2005+	MySQL uses composite for covering
Expression/Functional	All versions	8.0+	Via computed columns	MySQL added in 8.0
Columnstore	N/A	N/A	2012+	SQL Server-specific

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Queries filter on specific columns repeatedly	Table has < 1000 rows	Full table scan (faster for tiny tables)
Read/query workload dominates	Write throughput is the bottleneck	Batch inserts, then add indexes after bulk load
Query needs sorted output (ORDER BY)	Data is write-once, read-never (audit logs)	Append-only table, no indexes
Covering index eliminates heap lookups	You're indexing every column "just in case"	Targeted indexes based on actual query patterns
JSONB containment queries are slow	JSON values are small and rarely queried	No index; sequential scan on small data
Time-series data with range filters	Data is inserted in random order	B-tree (BRIN requires physical correlation)

Important Caveats

PostgreSQL hash indexes were not WAL-logged before version 10 — they were unsafe for production. Always use PG 10+ for hash indexes.
MySQL InnoDB stores the primary key at every secondary index leaf node. Very wide primary keys (e.g., UUID) inflate all secondary indexes. Prefer integer auto-increment PKs on InnoDB.
SQL Server clustered index physically reorders table data — you get exactly one per table. Choose it carefully (usually the primary key or most common range query column).
Index maintenance (REINDEX, OPTIMIZE TABLE, ALTER INDEX REBUILD) can lock the table. Use CONCURRENTLY (PG) or online rebuild (SQL Server Enterprise) in production.
Statistics staleness causes bad plans. PostgreSQL auto-analyzes, but after bulk loads run ANALYZE table_name; manually. MySQL: ANALYZE TABLE. SQL Server: UPDATE STATISTICS.
Multicolumn GIN indexes in PostgreSQL do not support the <@ (contained-by) operator — only @>, ?, ?|, ?&. Check operator class support before creating.