EXPLAIN ANALYZE (PostgreSQL/MySQL) or SET STATISTICS IO ON + actual execution plan (SQL Server)| Technique | When to Use | Impact | Complexity | Trade-off |
|---|---|---|---|---|
| Add covering index | Queries doing index lookups + heap fetches | Very high | Low | Extra write overhead + disk for each INSERT/UPDATE |
| Rewrite SELECT * to specific columns | Any query returning unnecessary data | High | Trivial | None -- strictly better |
| Add WHERE clause indexes | Sequential scans on filtered columns | Very high | Low | Index maintenance cost on writes |
| Replace correlated subquery with JOIN | Subquery executes once per outer row | Very high | Medium | JOIN may use more memory for hash/merge |
| Use EXISTS instead of IN for subqueries | Large subquery result sets | High | Low | Semantics differ with NULLs -- test carefully |
| Partition large tables | Tables > 100M rows with date/range filters | High | High | Query routing complexity, cross-partition JOINs slower |
| Materialize expensive CTEs / views | Repeated expensive aggregations | Very high | Medium | Stale data if not refreshed; storage cost |
| Use LIMIT with ORDER BY + index | Pagination queries | High | Low | Requires index matching ORDER BY columns |
| Batch INSERT/UPDATE operations | Bulk writes causing lock contention | High | Medium | Application-level batching logic needed |
| Add composite indexes | Multi-column WHERE/ORDER BY | Very high | Medium | Column order matters -- most selective first |
| Avoid implicit type conversions | WHERE varchar_col = 12345 (int) | High | Trivial | Requires schema awareness |
| Use query plan caching | Repeated parameterized queries | Medium | Low | Prepared statements may choose suboptimal generic plan |
| Update statistics | Bad plans after large data changes | High | Low | Brief CPU spike during ANALYZE |
| Denormalize hot paths | Read-heavy queries joining 5+ tables | Very high | High | Data duplication, consistency burden |
START: Query is slow
├── Have you run EXPLAIN ANALYZE?
│ ├── NO → Run EXPLAIN ANALYZE first (see Step 1 below)
│ └── YES ↓
├── Is there a Sequential Scan / Full Table Scan on a large table?
│ ├── YES → Is there a WHERE clause filtering rows?
│ │ ├── YES → Add index on WHERE columns (see Step 2)
│ │ └── NO → Query legitimately needs all rows -- optimize with LIMIT, pagination, or materialized view
│ └── NO ↓
├── Is the plan doing Index Scan but still slow?
│ ├── YES → Check if it's fetching too many columns (heap fetches) → Create covering index
│ └── NO ↓
├── Is there a Sort or Hash operation consuming >50% cost?
│ ├── YES → Add index matching ORDER BY / GROUP BY columns → Or increase work_mem
│ └── NO ↓
├── Is there a Nested Loop with high row estimates?
│ ├── YES → Replace with Hash Join (rewrite query) or add join column indexes
│ └── NO ↓
├── Are estimated rows very different from actual rows?
│ ├── YES → Run ANALYZE to update statistics → Check for data skew
│ └── NO ↓
├── Is the query using correlated subqueries?
│ ├── YES → Rewrite as JOIN or lateral join (see Anti-Patterns)
│ └── NO ↓
└── DEFAULT → Profile at application level -- problem may be N+1 queries, connection overhead, or network latency
Before changing anything, get the actual execution plan. The plan shows exactly where time is spent. [src1]
-- PostgreSQL: EXPLAIN ANALYZE executes the query -- wrap DML in a transaction
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;
ROLLBACK;
-- MySQL:
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;
Verify: Look for Seq Scan (PostgreSQL), type: ALL (MySQL), or Table Scan (SQL Server) on large tables -- these are the primary targets.
The execution plan reveals which table accesses lack indexes. Add targeted indexes for WHERE, JOIN, and ORDER BY columns. [src3]
-- PostgreSQL: Create index without locking writes
CREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders (created_at DESC);
-- MySQL: Online DDL (InnoDB)
ALTER TABLE orders
ADD INDEX idx_orders_created_at (created_at DESC),
ALGORITHM=INPLACE, LOCK=NONE;
-- SQL Server: Online index creation
CREATE NONCLUSTERED INDEX idx_orders_created_at
ON orders (created_at DESC)
WITH (ONLINE = ON);
Verify: Re-run EXPLAIN ANALYZE -- you should see Index Scan or Index Only Scan replacing Seq Scan.
Fetch only what you need. SELECT * forces the engine to read every column from heap pages even when an index covers the query. [src3]
-- Before: fetches all columns, prevents index-only scan
SELECT * FROM orders WHERE customer_id = 42;
-- After: fetches only needed columns, enables covering index
SELECT id, total, status FROM orders WHERE customer_id = 42;
Verify: EXPLAIN output should show Index Only Scan (PostgreSQL) or Using index (MySQL) when all selected columns are in the index.
Correlated subqueries execute once per outer row. JOINs allow the optimizer to choose hash or merge strategies. [src5]
-- Before: correlated subquery (O(n*m) execution)
SELECT c.name, (
SELECT SUM(o.total) FROM orders o WHERE o.customer_id = c.id
) AS total_spent
FROM customers c;
-- After: JOIN with aggregation (single pass)
SELECT c.name, COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;
Verify: EXPLAIN should show Hash Aggregate or GroupAggregate instead of SubPlan nodes.
Multi-column indexes must follow the leftmost-prefix rule. A covering index includes all columns the query needs, eliminating heap fetches entirely. [src3]
-- Composite index: most selective column first
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, created_at DESC);
-- Covering index (PostgreSQL INCLUDE syntax)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at DESC)
INCLUDE (total, status);
-- MySQL covering index (all columns in the index)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at, total, status);
Verify: EXPLAIN shows Index Only Scan (PostgreSQL) or Using index in Extra column (MySQL).
GROUP BY performance depends on whether the optimizer can use an index for grouping or must sort/hash. [src1]
-- Create index matching GROUP BY + aggregate
CREATE INDEX idx_orders_status_total
ON orders (status, total);
-- Query that benefits from the index
SELECT status, SUM(total), COUNT(*)
FROM orders
GROUP BY status;
Verify: EXPLAIN should show GroupAggregate using an index, not HashAggregate with a Sort node.
The query planner relies on table statistics to estimate row counts. After bulk loads or deletes, statistics become stale and cause bad plans. [src1]
-- PostgreSQL
ANALYZE orders;
-- MySQL
ANALYZE TABLE orders;
-- SQL Server
UPDATE STATISTICS orders;
-- Or with full scan for accuracy
UPDATE STATISTICS orders WITH FULLSCAN;
Verify: Re-run EXPLAIN ANALYZE -- estimated rows should be close to actual rows (within 10x).
-- Input: connected to PostgreSQL database
-- Output: tables with high sequential scan ratios (index candidates)
SELECT
schemaname,
relname AS table_name,
seq_scan,
idx_scan,
CASE WHEN seq_scan + idx_scan > 0
THEN ROUND(100.0 * seq_scan / (seq_scan + idx_scan), 1)
ELSE 0
END AS seq_scan_pct,
n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
AND seq_scan > idx_scan
ORDER BY seq_scan_pct DESC, n_live_tup DESC
LIMIT 20;
-- Input: MySQL 8.0+ with admin privileges
-- Output: slow queries captured to file for analysis
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL log_queries_not_using_indexes = 'ON';
-- Check current setting
SHOW VARIABLES LIKE 'slow_query%';
-- After collecting data, analyze with pt-query-digest:
-- pt-query-digest /var/lib/mysql/hostname-slow.log
-- Input: SQL Server 2019+ with VIEW SERVER STATE permission
-- Output: top 20 queries by total elapsed time
SELECT TOP 20
qs.total_elapsed_time / qs.execution_count AS avg_elapsed_ms,
qs.execution_count,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
SUBSTRING(qt.text, (qs.statement_start_offset / 2) + 1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset) / 2) + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_elapsed_time / qs.execution_count DESC;
-- BAD -- fetches all columns, prevents index-only scans,
-- breaks when schema changes, transfers unnecessary data
SELECT * FROM orders WHERE customer_id = 42;
-- GOOD -- enables covering indexes, reduces I/O and network transfer
SELECT id, total, status, created_at
FROM orders
WHERE customer_id = 42;
-- BAD -- executes 1 query + N queries (one per customer)
-- Application loop:
-- customers = SELECT * FROM customers;
-- for each customer:
-- orders = SELECT * FROM orders WHERE customer_id = ?;
-- GOOD -- single query, database optimizes the join
SELECT c.id, c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.active = true;
-- BAD -- wrapping indexed column in function prevents index usage
SELECT * FROM orders
WHERE YEAR(created_at) = 2025;
-- BAD -- implicit type conversion disables index
SELECT * FROM users
WHERE phone = 5551234567; -- phone is VARCHAR, literal is INT
-- GOOD -- sargable: index on created_at is used
SELECT * FROM orders
WHERE created_at >= '2025-01-01'
AND created_at < '2026-01-01';
-- GOOD -- matching types
SELECT * FROM users
WHERE phone = '5551234567';
-- BAD -- OR on different columns forces full table scan
SELECT * FROM products
WHERE category_id = 5 OR supplier_id = 12;
-- GOOD -- each branch can use its own index
SELECT * FROM products WHERE category_id = 5
UNION ALL
SELECT * FROM products WHERE supplier_id = 12
AND category_id <> 5;
-- BAD -- returns potentially millions of rows
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
ORDER BY created_at;
-- GOOD -- keyset pagination (fastest for large offsets)
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
AND id > :last_seen_id
ORDER BY id
LIMIT 50;
ANALYZE (PostgreSQL), ANALYZE TABLE (MySQL), or UPDATE STATISTICS (SQL Server) after large INSERT/DELETE/UPDATE batches. [src1]pg_stat_user_indexes (PostgreSQL) or sys.dm_db_index_usage_stats (SQL Server) to find and drop unused indexes. [src3]EXPLAIN (ANALYZE, BUFFERS) in PostgreSQL to see shared hit vs read ratios. [src5]OFFSET 100000 LIMIT 50 still scans and discards 100,000 rows. Fix: Use keyset/cursor pagination: WHERE id > :last_id ORDER BY id LIMIT 50. [src3]autovacuum_vacuum_scale_factor, autovacuum_analyze_scale_factor). [src1]-- PostgreSQL: Show execution plan with timing and buffer info
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ... ;
-- PostgreSQL: Find queries consuming the most time (requires pg_stat_statements)
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;
-- PostgreSQL: Check for unused indexes
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- PostgreSQL: Check table bloat and dead tuples
SELECT relname, n_dead_tup, n_live_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;
-- MySQL: Show execution plan
EXPLAIN ANALYZE SELECT ... ;
-- MySQL: Check slow query log status
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';
-- MySQL: Show table index usage
SHOW INDEX FROM orders;
-- SQL Server: Show I/O and time statistics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT ... ;
-- SQL Server: Show estimated execution plan
SET SHOWPLAN_XML ON;
GO
SELECT ... ;
GO
SET SHOWPLAN_XML OFF;
| Engine | Version | Key Optimization Features | Notes |
|---|---|---|---|
| PostgreSQL 18 | Current (2025) | EXPLAIN BUFFERS by default, incremental sort improvements | Parallel query improvements |
| PostgreSQL 16 | Stable | Parallel FULL OUTER JOIN, improved DISTINCT | Recommended minimum for modern features |
| PostgreSQL 14 | LTS-like | Multirange types, query pipeline mode | Extended statistics improvements |
| MySQL 9.x | Current (2025) | Improved hash joins, parallel query | Innovation release track |
| MySQL 8.4 | LTS (2024) | Window functions, CTEs, EXPLAIN ANALYZE | First LTS release |
| MySQL 8.0 | EOL (2026-04) | Query cache removed, hash joins added | Upgrade to 8.4+ recommended |
| SQL Server 2022 | Current | Intelligent Query Processing, Parameter Sensitivity Plan | Adaptive joins, batch mode on rowstore |
| SQL Server 2019 | Mainstream support | Adaptive memory grants, batch mode on rowstore | Table variable deferred compilation |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Individual queries are slow (> 100ms for OLTP) | All queries degraded equally | Check server resources: CPU, memory, I/O, connections |
| EXPLAIN shows sequential scans on large tables | Table is small (< 10,000 rows) | Sequential scan is often faster than index for small tables |
| Application has identified slow endpoints via APM | Problem is at the ORM/application layer (N+1) | Fix N+1 at application level first, not database level |
| Read-heavy workload with predictable query patterns | Write-heavy workload with many indexes already | Reduce index count, use batch writes, consider async processing |
| Query patterns are stable and well-understood | Ad-hoc analytical queries on large datasets | Use OLAP-optimized engines (ClickHouse, DuckDB, BigQuery) |