SQL query performance tuning

- Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.

How to optimize slow SQL queries

- Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.

SQL EXPLAIN ANALYZE optimization

- Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.

Database query optimization best practices

- Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.

SQL indexing strategies for performance

- Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.

SQL Query Optimization: Complete Reference

What are the best SQL query optimization techniques?

TL;DR

Bottom line: Most slow SQL queries are fixed by proper indexing, avoiding full table scans, and rewriting queries to let the optimizer choose efficient plans -- use EXPLAIN ANALYZE to diagnose before guessing.
Key tool/command: EXPLAIN ANALYZE (PostgreSQL/MySQL) or SET STATISTICS IO ON + actual execution plan (SQL Server)
Watch out for: Adding indexes without checking if the optimizer actually uses them -- unused indexes waste write performance and disk space.
Works with: PostgreSQL 13+, MySQL 8.0+, SQL Server 2019+, Oracle 19c+, SQLite 3.35+ (core principles are engine-agnostic).

Constraints

EXPLAIN ANALYZE executes the query -- never run it on UPDATE/DELETE/INSERT without wrapping in BEGIN/ROLLBACK
Index creation on production tables can lock writes -- use CREATE INDEX CONCURRENTLY (PostgreSQL) or ALGORITHM=INPLACE (MySQL) for zero-downtime deployments
work_mem (PostgreSQL) and sort_buffer_size (MySQL) are per-operation, not per-session -- setting too high causes OOM under concurrent load
Query hints and optimizer directives are engine-specific -- PostgreSQL has no native hint syntax; MySQL uses optimizer_switch; SQL Server uses query hints
Statistics staleness causes bad plans -- ensure ANALYZE / ANALYZE TABLE / UPDATE STATISTICS runs regularly or via autovacuum/auto-stats

Quick Reference

Technique	When to Use	Impact	Complexity	Trade-off
Add covering index	Queries doing index lookups + heap fetches	Very high	Low	Extra write overhead + disk for each INSERT/UPDATE
Rewrite SELECT * to specific columns	Any query returning unnecessary data	High	Trivial	None -- strictly better
Add WHERE clause indexes	Sequential scans on filtered columns	Very high	Low	Index maintenance cost on writes
Replace correlated subquery with JOIN	Subquery executes once per outer row	Very high	Medium	JOIN may use more memory for hash/merge
Use EXISTS instead of IN for subqueries	Large subquery result sets	High	Low	Semantics differ with NULLs -- test carefully
Partition large tables	Tables > 100M rows with date/range filters	High	High	Query routing complexity, cross-partition JOINs slower
Materialize expensive CTEs / views	Repeated expensive aggregations	Very high	Medium	Stale data if not refreshed; storage cost
Use LIMIT with ORDER BY + index	Pagination queries	High	Low	Requires index matching ORDER BY columns
Batch INSERT/UPDATE operations	Bulk writes causing lock contention	High	Medium	Application-level batching logic needed
Add composite indexes	Multi-column WHERE/ORDER BY	Very high	Medium	Column order matters -- most selective first
Avoid implicit type conversions	WHERE varchar_col = 12345 (int)	High	Trivial	Requires schema awareness
Use query plan caching	Repeated parameterized queries	Medium	Low	Prepared statements may choose suboptimal generic plan
Update statistics	Bad plans after large data changes	High	Low	Brief CPU spike during ANALYZE
Denormalize hot paths	Read-heavy queries joining 5+ tables	Very high	High	Data duplication, consistency burden

Decision Tree

START: Query is slow
├── Have you run EXPLAIN ANALYZE?
│   ├── NO → Run EXPLAIN ANALYZE first (see Step 1 below)
│   └── YES ↓
├── Is there a Sequential Scan / Full Table Scan on a large table?
│   ├── YES → Is there a WHERE clause filtering rows?
│   │   ├── YES → Add index on WHERE columns (see Step 2)
│   │   └── NO → Query legitimately needs all rows -- optimize with LIMIT, pagination, or materialized view
│   └── NO ↓
├── Is the plan doing Index Scan but still slow?
│   ├── YES → Check if it's fetching too many columns (heap fetches) → Create covering index
│   └── NO ↓
├── Is there a Sort or Hash operation consuming >50% cost?
│   ├── YES → Add index matching ORDER BY / GROUP BY columns → Or increase work_mem
│   └── NO ↓
├── Is there a Nested Loop with high row estimates?
│   ├── YES → Replace with Hash Join (rewrite query) or add join column indexes
│   └── NO ↓
├── Are estimated rows very different from actual rows?
│   ├── YES → Run ANALYZE to update statistics → Check for data skew
│   └── NO ↓
├── Is the query using correlated subqueries?
│   ├── YES → Rewrite as JOIN or lateral join (see Anti-Patterns)
│   └── NO ↓
└── DEFAULT → Profile at application level -- problem may be N+1 queries, connection overhead, or network latency

Step-by-Step Guide

1. Profile the slow query with EXPLAIN ANALYZE

Before changing anything, get the actual execution plan. The plan shows exactly where time is spent. [src1]

-- PostgreSQL: EXPLAIN ANALYZE executes the query -- wrap DML in a transaction
BEGIN;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;
ROLLBACK;

-- MySQL:
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2025-01-01'
ORDER BY o.total DESC
LIMIT 100;

Verify: Look for Seq Scan (PostgreSQL), type: ALL (MySQL), or Table Scan (SQL Server) on large tables -- these are the primary targets.

2. Identify and add missing indexes

The execution plan reveals which table accesses lack indexes. Add targeted indexes for WHERE, JOIN, and ORDER BY columns. [src3]

-- PostgreSQL: Create index without locking writes
CREATE INDEX CONCURRENTLY idx_orders_created_at
ON orders (created_at DESC);

-- MySQL: Online DDL (InnoDB)
ALTER TABLE orders
ADD INDEX idx_orders_created_at (created_at DESC),
ALGORITHM=INPLACE, LOCK=NONE;

-- SQL Server: Online index creation
CREATE NONCLUSTERED INDEX idx_orders_created_at
ON orders (created_at DESC)
WITH (ONLINE = ON);

Verify: Re-run EXPLAIN ANALYZE -- you should see Index Scan or Index Only Scan replacing Seq Scan.

3. Eliminate unnecessary columns and rows

Fetch only what you need. SELECT * forces the engine to read every column from heap pages even when an index covers the query. [src3]

-- Before: fetches all columns, prevents index-only scan
SELECT * FROM orders WHERE customer_id = 42;

-- After: fetches only needed columns, enables covering index
SELECT id, total, status FROM orders WHERE customer_id = 42;

Verify: EXPLAIN output should show Index Only Scan (PostgreSQL) or Using index (MySQL) when all selected columns are in the index.

4. Rewrite subqueries as JOINs

Correlated subqueries execute once per outer row. JOINs allow the optimizer to choose hash or merge strategies. [src5]

-- Before: correlated subquery (O(n*m) execution)
SELECT c.name, (
  SELECT SUM(o.total) FROM orders o WHERE o.customer_id = c.id
) AS total_spent
FROM customers c;

-- After: JOIN with aggregation (single pass)
SELECT c.name, COALESCE(SUM(o.total), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;

Verify: EXPLAIN should show Hash Aggregate or GroupAggregate instead of SubPlan nodes.

5. Create composite and covering indexes

Multi-column indexes must follow the leftmost-prefix rule. A covering index includes all columns the query needs, eliminating heap fetches entirely. [src3]

-- Composite index: most selective column first
CREATE INDEX idx_orders_cust_date
ON orders (customer_id, created_at DESC);

-- Covering index (PostgreSQL INCLUDE syntax)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at DESC)
INCLUDE (total, status);

-- MySQL covering index (all columns in the index)
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at, total, status);

Verify: EXPLAIN shows Index Only Scan (PostgreSQL) or Using index in Extra column (MySQL).

6. Optimize aggregation queries

GROUP BY performance depends on whether the optimizer can use an index for grouping or must sort/hash. [src1]

-- Create index matching GROUP BY + aggregate
CREATE INDEX idx_orders_status_total
ON orders (status, total);

-- Query that benefits from the index
SELECT status, SUM(total), COUNT(*)
FROM orders
GROUP BY status;

Verify: EXPLAIN should show GroupAggregate using an index, not HashAggregate with a Sort node.

7. Update statistics after large data changes

The query planner relies on table statistics to estimate row counts. After bulk loads or deletes, statistics become stale and cause bad plans. [src1]

-- PostgreSQL
ANALYZE orders;

-- MySQL
ANALYZE TABLE orders;

-- SQL Server
UPDATE STATISTICS orders;
-- Or with full scan for accuracy
UPDATE STATISTICS orders WITH FULLSCAN;

Verify: Re-run EXPLAIN ANALYZE -- estimated rows should be close to actual rows (within 10x).

Code Examples

PostgreSQL: Find missing indexes with pg_stat_user_tables

-- Input:  connected to PostgreSQL database
-- Output: tables with high sequential scan ratios (index candidates)

SELECT
  schemaname,
  relname AS table_name,
  seq_scan,
  idx_scan,
  CASE WHEN seq_scan + idx_scan > 0
    THEN ROUND(100.0 * seq_scan / (seq_scan + idx_scan), 1)
    ELSE 0
  END AS seq_scan_pct,
  n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
  AND seq_scan > idx_scan
ORDER BY seq_scan_pct DESC, n_live_tup DESC
LIMIT 20;

MySQL: Enable and query the slow query log

-- Input:  MySQL 8.0+ with admin privileges
-- Output: slow queries captured to file for analysis

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Check current setting
SHOW VARIABLES LIKE 'slow_query%';

-- After collecting data, analyze with pt-query-digest:
-- pt-query-digest /var/lib/mysql/hostname-slow.log

SQL Server: Find top resource-consuming queries

-- Input:  SQL Server 2019+ with VIEW SERVER STATE permission
-- Output: top 20 queries by total elapsed time

SELECT TOP 20
  qs.total_elapsed_time / qs.execution_count AS avg_elapsed_ms,
  qs.execution_count,
  qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
  SUBSTRING(qt.text, (qs.statement_start_offset / 2) + 1,
    ((CASE qs.statement_end_offset
        WHEN -1 THEN DATALENGTH(qt.text)
        ELSE qs.statement_end_offset
      END - qs.statement_start_offset) / 2) + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_elapsed_time / qs.execution_count DESC;

Anti-Patterns

Wrong: Using SELECT * in production queries

-- BAD -- fetches all columns, prevents index-only scans,
-- breaks when schema changes, transfers unnecessary data
SELECT * FROM orders WHERE customer_id = 42;

Correct: Select only needed columns

-- GOOD -- enables covering indexes, reduces I/O and network transfer
SELECT id, total, status, created_at
FROM orders
WHERE customer_id = 42;

Wrong: N+1 query pattern in application code

-- BAD -- executes 1 query + N queries (one per customer)
-- Application loop:
--   customers = SELECT * FROM customers;
--   for each customer:
--     orders = SELECT * FROM orders WHERE customer_id = ?;

Correct: Single JOIN query

-- GOOD -- single query, database optimizes the join
SELECT c.id, c.name, o.id AS order_id, o.total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.active = true;

Wrong: Using functions on indexed columns in WHERE

-- BAD -- wrapping indexed column in function prevents index usage
SELECT * FROM orders
WHERE YEAR(created_at) = 2025;

-- BAD -- implicit type conversion disables index
SELECT * FROM users
WHERE phone = 5551234567;  -- phone is VARCHAR, literal is INT

Correct: Rewrite to keep indexed column bare

-- GOOD -- sargable: index on created_at is used
SELECT * FROM orders
WHERE created_at >= '2025-01-01'
  AND created_at < '2026-01-01';

-- GOOD -- matching types
SELECT * FROM users
WHERE phone = '5551234567';

Wrong: Using OR conditions that prevent index usage

-- BAD -- OR on different columns forces full table scan
SELECT * FROM products
WHERE category_id = 5 OR supplier_id = 12;

Correct: Use UNION ALL for OR on different indexed columns

-- GOOD -- each branch can use its own index
SELECT * FROM products WHERE category_id = 5
UNION ALL
SELECT * FROM products WHERE supplier_id = 12
  AND category_id <> 5;

Wrong: Unbounded queries without LIMIT

-- BAD -- returns potentially millions of rows
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
ORDER BY created_at;

Correct: Always paginate large result sets

-- GOOD -- keyset pagination (fastest for large offsets)
SELECT id, name, email FROM users
WHERE created_at > '2020-01-01'
  AND id > :last_seen_id
ORDER BY id
LIMIT 50;

Common Pitfalls

Stale statistics after bulk operations: EXPLAIN shows good plan but actual performance is terrible because row estimates are 100x off. Fix: Run ANALYZE (PostgreSQL), ANALYZE TABLE (MySQL), or UPDATE STATISTICS (SQL Server) after large INSERT/DELETE/UPDATE batches. [src1]
Over-indexing write-heavy tables: Every index slows down INSERT/UPDATE/DELETE. Tables with 20+ indexes often have slower write performance than query gains. Fix: Use pg_stat_user_indexes (PostgreSQL) or sys.dm_db_index_usage_stats (SQL Server) to find and drop unused indexes. [src3]
Ignoring the buffer cache / I/O distinction: A query may be fast when data is cached but slow on cold start. Fix: Use EXPLAIN (ANALYZE, BUFFERS) in PostgreSQL to see shared hit vs read ratios. [src5]
OFFSET pagination on large tables: OFFSET 100000 LIMIT 50 still scans and discards 100,000 rows. Fix: Use keyset/cursor pagination: WHERE id > :last_id ORDER BY id LIMIT 50. [src3]
Premature denormalization: Duplicating data across tables before profiling actual bottlenecks creates consistency bugs. Fix: Profile first with EXPLAIN ANALYZE, add indexes, then denormalize only proven hot paths. [src7]
Relying on query cache (MySQL < 8.0): MySQL removed the query cache in 8.0 because it caused contention. Fix: Use application-level caching (Redis, Memcached) instead. [src2]
Not testing with production-like data volume: A query that is fast on 1,000 rows may be catastrophically slow on 10 million. Fix: Test optimization changes against a staging environment with production data volume and distribution. [src6]
Missing VACUUM on PostgreSQL: Dead tuples cause table bloat and slow sequential scans. Fix: Ensure autovacuum is running and tuned (autovacuum_vacuum_scale_factor, autovacuum_analyze_scale_factor). [src1]

Diagnostic Commands

-- PostgreSQL: Show execution plan with timing and buffer info
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ... ;

-- PostgreSQL: Find queries consuming the most time (requires pg_stat_statements)
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

-- PostgreSQL: Check for unused indexes
SELECT indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- PostgreSQL: Check table bloat and dead tuples
SELECT relname, n_dead_tup, n_live_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;

-- MySQL: Show execution plan
EXPLAIN ANALYZE SELECT ... ;

-- MySQL: Check slow query log status
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- MySQL: Show table index usage
SHOW INDEX FROM orders;

-- SQL Server: Show I/O and time statistics
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT ... ;

-- SQL Server: Show estimated execution plan
SET SHOWPLAN_XML ON;
GO
SELECT ... ;
GO
SET SHOWPLAN_XML OFF;

Version History & Compatibility

Engine	Version	Key Optimization Features	Notes
PostgreSQL 18	Current (2025)	EXPLAIN BUFFERS by default, incremental sort improvements	Parallel query improvements
PostgreSQL 16	Stable	Parallel FULL OUTER JOIN, improved DISTINCT	Recommended minimum for modern features
PostgreSQL 14	LTS-like	Multirange types, query pipeline mode	Extended statistics improvements
MySQL 9.x	Current (2025)	Improved hash joins, parallel query	Innovation release track
MySQL 8.4	LTS (2024)	Window functions, CTEs, EXPLAIN ANALYZE	First LTS release
MySQL 8.0	EOL (2026-04)	Query cache removed, hash joins added	Upgrade to 8.4+ recommended
SQL Server 2022	Current	Intelligent Query Processing, Parameter Sensitivity Plan	Adaptive joins, batch mode on rowstore
SQL Server 2019	Mainstream support	Adaptive memory grants, batch mode on rowstore	Table variable deferred compilation

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Individual queries are slow (> 100ms for OLTP)	All queries degraded equally	Check server resources: CPU, memory, I/O, connections
EXPLAIN shows sequential scans on large tables	Table is small (< 10,000 rows)	Sequential scan is often faster than index for small tables
Application has identified slow endpoints via APM	Problem is at the ORM/application layer (N+1)	Fix N+1 at application level first, not database level
Read-heavy workload with predictable query patterns	Write-heavy workload with many indexes already	Reduce index count, use batch writes, consider async processing
Query patterns are stable and well-understood	Ad-hoc analytical queries on large datasets	Use OLAP-optimized engines (ClickHouse, DuckDB, BigQuery)

Important Caveats

Optimization techniques are engine-specific -- an approach that works on PostgreSQL (e.g., partial indexes, INCLUDE columns) may not exist or behave differently on MySQL or SQL Server
Index recommendations from automated tools (pg_stat_statements, Query Store, MySQL Performance Schema) should be validated with EXPLAIN before deploying -- they may suggest redundant or conflicting indexes
Cloud-managed databases (AWS RDS, Azure SQL, Cloud SQL) have restricted access to some diagnostic tools and configuration parameters -- check your provider's documentation for available optimization levers
Premature optimization without profiling wastes engineering time -- always measure with EXPLAIN ANALYZE first, then target the specific bottleneck