SQL Query Optimization: Complete Reference

Type: Software Reference Confidence: 0.90 Sources: 7 Verified: 2026-02-24 Freshness: 2026-02-24

TL;DR

Constraints

Quick Reference

TechniqueWhen to UseImpactComplexityTrade-off
Add covering indexQueries doing index lookups + heap fetchesVery highLowExtra write overhead + disk for each INSERT/UPDATE
Rewrite SELECT * to specific columnsAny query returning unnecessary dataHighTrivialNone -- strictly better
Add WHERE clause indexesSequential scans on filtered columnsVery highLowIndex maintenance cost on writes
Replace correlated subquery with JOINSubquery executes once per outer rowVery highMediumJOIN may use more memory for hash/merge
Use EXISTS instead of IN for subqueriesLarge subquery result setsHighLowSemantics differ with NULLs -- test carefully
Partition large tablesTables > 100M rows with date/range filtersHighHighQuery routing complexity, cross-partition JOINs slower
Materialize expensive CTEs / viewsRepeated expensive aggregationsVery highMediumStale data if not refreshed; storage cost
Use LIMIT with ORDER BY + indexPagination queriesHighLowRequires index matching ORDER BY columns
Batch INSERT/UPDATE operationsBulk writes causing lock contentionHighMediumApplication-level batching logic needed
Add composite indexesMulti-column WHERE/ORDER BYVery highMediumColumn order matters -- most selective first
Avoid implicit type conversionsWHERE varchar_col = 12345 (int)HighTrivialRequires schema awareness
Use query plan cachingRepeated parameterized queriesMediumLowPrepared statements may choose suboptimal generic plan
Update statisticsBad plans after large data changesHighLowBrief CPU spike during ANALYZE
Denormalize hot pathsRead-heavy queries joining 5+ tablesVery highHighData duplication, consistency burden

Decision Tree

START: Query is slow
├── Have you run EXPLAIN ANALYZE?
│   ├── NO → Run EXPLAIN ANALYZE first (see Step 1 below)
│   └── YES ↓
├── Is there a Sequential Scan / Full Table Scan on a large table?
│   ├── YES → Is there a WHERE clause filtering rows?
│   │   ├── YES → Add index on WHERE columns (see Step 2)
│   │   └── NO → Query legitimately needs all rows -- optimize with LIMIT, pagination, or materialized view
│   └── NO ↓
├── Is the plan doing Index Scan but still slow?
│   ├── YES → Check if it's fetching too many columns (heap fetches) → Create covering index
│   └── NO ↓
├── Is there a Sort or Hash operation consuming >50% cost?
│   ├── YES → Add index matching ORDER BY / GROUP BY columns → Or increase work_mem
│   └── NO ↓
├── Is there a Nested Loop with high row estimates?
│   ├── YES → Replace with Hash Join (rewrite query) or add join column indexes
│   └── NO ↓
├── Are estimated rows very different from actual rows?
│   ├── YES → Run ANALYZE to update statistics → Check for data skew
│   └── NO ↓
├── Is the query using correlated subqueries?
│   ├── YES → Rewrite as JOIN or lateral join (see Anti-Patterns)
│   └── NO ↓
└── DEFAULT → Profile at application level -- problem may be N+1 queries, connection overhead, or network latency

Step-by-Step Guide

1. Profile the slow query with EXPLAIN ANALYZE

Before changing anything, get the actual execution plan. The plan shows exactly where time is spent. [src1]

-- PostgreSQL: EXPLAIN ANALYZE executes the query -- wrap DML in a transaction
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;
ROLLBACK;

-- MySQL:
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;

Verify: Look for Seq Scan (PostgreSQL), type: ALL (MySQL), or Table Scan (SQL Server) on large tables -- these are the primary targets.

2. Identify and add missing indexes

The execution plan reveals which table accesses lack indexes. Add targeted indexes for WHERE, JOIN, and ORDER BY columns. [src3]

-- PostgreSQL: Create index without locking writes
CREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders (created_at DESC);

-- MySQL: Online DDL (InnoDB)
ALTER TABLE orders
ADD INDEX idx_orders_created_at (created_at DESC),
ALGORITHM=INPLACE, LOCK=NONE;

-- SQL Server: Online index creation
CREATE NONCLUSTERED INDEX idx_orders_created_at
ON orders (created_at DESC)
WITH (ONLINE = ON);

Verify: Re-run EXPLAIN ANALYZE -- you should see Index Scan or Index Only Scan replacing Seq Scan.

3. Eliminate unnecessary columns and rows

Fetch only what you need. SELECT * forces the engine to read every column from heap pages even when an index covers the query. [src3]

-- Before: fetches all columns, prevents index-only scan
SELECT * FROM orders WHERE customer_id = 42;

-- After: fetches only needed columns, enables covering index
SELECT id, total, status FROM orders WHERE customer_id = 42;

Verify: EXPLAIN output should show Index Only Scan (PostgreSQL) or Using index (MySQL) when all selected columns are in the index.

4. Rewrite subqueries as JOINs

Correlated subqueries execute once per outer row. JOINs allow the optimizer to choose hash or merge strategies. [src5]

-- Before: correlated subquery (O(n*m) execution)
SELECT c.name, (
  SELECT SUM(o.total) FROM orders o WHERE o.customer_id = c.id
) AS total_spent
FROM customers c;

-- After: JOIN with aggregation (single pass)
SELECT c.name, COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

Verify: EXPLAIN should show Hash Aggregate or GroupAggregate instead of SubPlan nodes.

5. Create composite and covering indexes

Multi-column indexes must follow the leftmost-prefix rule. A covering index includes all columns the query needs, eliminating heap fetches entirely. [src3]

-- Composite index: most selective column first
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, created_at DESC);

-- Covering index (PostgreSQL INCLUDE syntax)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at DESC)
INCLUDE (total, status);

-- MySQL covering index (all columns in the index)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at, total, status);

Verify: EXPLAIN shows Index Only Scan (PostgreSQL) or Using index in Extra column (MySQL).

6. Optimize aggregation queries

GROUP BY performance depends on whether the optimizer can use an index for grouping or must sort/hash. [src1]

-- Create index matching GROUP BY + aggregate
CREATE INDEX idx_orders_status_total
ON orders (status, total);

-- Query that benefits from the index
SELECT status, SUM(total), COUNT(*)
FROM orders
GROUP BY status;

Verify: EXPLAIN should show GroupAggregate using an index, not HashAggregate with a Sort node.

7. Update statistics after large data changes

The query planner relies on table statistics to estimate row counts. After bulk loads or deletes, statistics become stale and cause bad plans. [src1]

-- PostgreSQL
ANALYZE orders;

-- MySQL
ANALYZE TABLE orders;

-- SQL Server
UPDATE STATISTICS orders;
-- Or with full scan for accuracy
UPDATE STATISTICS orders WITH FULLSCAN;

Verify: Re-run EXPLAIN ANALYZE -- estimated rows should be close to actual rows (within 10x).

Code Examples

PostgreSQL: Find missing indexes with pg_stat_user_tables

-- Input:  connected to PostgreSQL database
-- Output: tables with high sequential scan ratios (index candidates)

SELECT
  schemaname,
  relname AS table_name,
  seq_scan,
  idx_scan,
  CASE WHEN seq_scan + idx_scan > 0
    THEN ROUND(100.0 * seq_scan / (seq_scan + idx_scan), 1)
    ELSE 0
  END AS seq_scan_pct,
  n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
  AND seq_scan > idx_scan
ORDER BY seq_scan_pct DESC, n_live_tup DESC
LIMIT 20;

MySQL: Enable and query the slow query log

-- Input:  MySQL 8.0+ with admin privileges
-- Output: slow queries captured to file for analysis

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Check current setting
SHOW VARIABLES LIKE 'slow_query%';

-- After collecting data, analyze with pt-query-digest:
-- pt-query-digest /var/lib/mysql/hostname-slow.log

SQL Server: Find top resource-consuming queries

-- Input:  SQL Server 2019+ with VIEW SERVER STATE permission
-- Output: top 20 queries by total elapsed time

SELECT TOP 20
  qs.total_elapsed_time / qs.execution_count AS avg_elapsed_ms,
  qs.execution_count,
  qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
  SUBSTRING(qt.text, (qs.statement_start_offset / 2) + 1,
    ((CASE qs.statement_end_offset
        WHEN -1 THEN DATALENGTH(qt.text)
        ELSE qs.statement_end_offset
      END - qs.statement_start_offset) / 2) + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_elapsed_time / qs.execution_count DESC;

Anti-Patterns

Wrong: Using SELECT * in production queries

-- BAD -- fetches all columns, prevents index-only scans,
-- breaks when schema changes, transfers unnecessary data
SELECT * FROM orders WHERE customer_id = 42;

Correct: Select only needed columns

-- GOOD -- enables covering indexes, reduces I/O and network transfer
SELECT id, total, status, created_at
FROM orders
WHERE customer_id = 42;

Wrong: N+1 query pattern in application code

-- BAD -- executes 1 query + N queries (one per customer)
-- Application loop:
--   customers = SELECT * FROM customers;
--   for each customer:
--     orders = SELECT * FROM orders WHERE customer_id = ?;

Correct: Single JOIN query

-- GOOD -- single query, database optimizes the join
SELECT c.id, c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.active = true;

Wrong: Using functions on indexed columns in WHERE

-- BAD -- wrapping indexed column in function prevents index usage
SELECT * FROM orders
WHERE YEAR(created_at) = 2025;

-- BAD -- implicit type conversion disables index
SELECT * FROM users
WHERE phone = 5551234567;  -- phone is VARCHAR, literal is INT

Correct: Rewrite to keep indexed column bare

-- GOOD -- sargable: index on created_at is used
SELECT * FROM orders
WHERE created_at >= '2025-01-01'
  AND created_at < '2026-01-01';

-- GOOD -- matching types
SELECT * FROM users
WHERE phone = '5551234567';

Wrong: Using OR conditions that prevent index usage

-- BAD -- OR on different columns forces full table scan
SELECT * FROM products
WHERE category_id = 5 OR supplier_id = 12;

Correct: Use UNION ALL for OR on different indexed columns

-- GOOD -- each branch can use its own index
SELECT * FROM products WHERE category_id = 5
UNION ALL
SELECT * FROM products WHERE supplier_id = 12
  AND category_id <> 5;

Wrong: Unbounded queries without LIMIT

-- BAD -- returns potentially millions of rows
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
ORDER BY created_at;

Correct: Always paginate large result sets

-- GOOD -- keyset pagination (fastest for large offsets)
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
  AND id > :last_seen_id
ORDER BY id
LIMIT 50;

Common Pitfalls

Diagnostic Commands

-- PostgreSQL: Show execution plan with timing and buffer info
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ... ;

-- PostgreSQL: Find queries consuming the most time (requires pg_stat_statements)
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

-- PostgreSQL: Check for unused indexes
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- PostgreSQL: Check table bloat and dead tuples
SELECT relname, n_dead_tup, n_live_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;

-- MySQL: Show execution plan
EXPLAIN ANALYZE SELECT ... ;

-- MySQL: Check slow query log status
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- MySQL: Show table index usage
SHOW INDEX FROM orders;

-- SQL Server: Show I/O and time statistics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT ... ;

-- SQL Server: Show estimated execution plan
SET SHOWPLAN_XML ON;
GO
SELECT ... ;
GO
SET SHOWPLAN_XML OFF;

Version History & Compatibility

EngineVersionKey Optimization FeaturesNotes
PostgreSQL 18Current (2025)EXPLAIN BUFFERS by default, incremental sort improvementsParallel query improvements
PostgreSQL 16StableParallel FULL OUTER JOIN, improved DISTINCTRecommended minimum for modern features
PostgreSQL 14LTS-likeMultirange types, query pipeline modeExtended statistics improvements
MySQL 9.xCurrent (2025)Improved hash joins, parallel queryInnovation release track
MySQL 8.4LTS (2024)Window functions, CTEs, EXPLAIN ANALYZEFirst LTS release
MySQL 8.0EOL (2026-04)Query cache removed, hash joins addedUpgrade to 8.4+ recommended
SQL Server 2022CurrentIntelligent Query Processing, Parameter Sensitivity PlanAdaptive joins, batch mode on rowstore
SQL Server 2019Mainstream supportAdaptive memory grants, batch mode on rowstoreTable variable deferred compilation

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Individual queries are slow (> 100ms for OLTP)All queries degraded equallyCheck server resources: CPU, memory, I/O, connections
EXPLAIN shows sequential scans on large tablesTable is small (< 10,000 rows)Sequential scan is often faster than index for small tables
Application has identified slow endpoints via APMProblem is at the ORM/application layer (N+1)Fix N+1 at application level first, not database level
Read-heavy workload with predictable query patternsWrite-heavy workload with many indexes alreadyReduce index count, use batch writes, consider async processing
Query patterns are stable and well-understoodAd-hoc analytical queries on large datasetsUse OLAP-optimized engines (ClickHouse, DuckDB, BigQuery)

Important Caveats

Related Units