CREATE INDEX idx ON table USING GIN (to_tsvector('english', column));'english') — using the wrong language dictionary silently drops valid stemmed tokensinnodb_ft_min_token_size is changed (requires index rebuild)fastupdate=on (default) or consider GiSTts_rank() does NOT use the GIN index for scoring — ranking always has I/O cost proportional to result set sizepg_trgm.similarity_threshold produces either too many false positives or too few results| Scenario | Pattern | Index Type | Strengths | Limitations | Best For |
|---|---|---|---|---|---|
| PostgreSQL keyword search | to_tsvector() + @@ + to_tsquery() | GIN | Stemming, ranking, language-aware, boolean ops | No fuzzy/typo tolerance | Structured natural-language queries |
| PostgreSQL phrase search | to_tsvector() + @@ + phraseto_tsquery() | GIN | Word proximity, phrase matching | Exact phrase order required | "Near me" or multi-word lookups |
| PostgreSQL fuzzy search | pg_trgm % operator | GIN (gin_trgm_ops) | Typo-tolerant, LIKE/ILIKE acceleration | Higher index size, no stemming | Search-as-you-type, user input with typos |
| PostgreSQL combined | tsvector + pg_trgm fallback | GIN (both) | Best coverage: stemmed + fuzzy | Two indexes, more storage | Production search with autocomplete |
| PostgreSQL weighted search | setweight() + ts_rank() | GIN | Title > body boosting (A/B/C/D weights) | Ranking not index-accelerated | Relevance-ranked results |
| PostgreSQL generated column | GENERATED ALWAYS AS (to_tsvector(...)) STORED | GIN | No trigger needed, auto-maintained | PostgreSQL 12+ only | Clean, maintainable tsvector storage |
| MySQL natural language | MATCH(...) AGAINST('...') | FULLTEXT | Simple setup, built-in relevance | 50% threshold, 3-char minimum | Basic search on InnoDB tables |
| MySQL boolean mode | MATCH(...) AGAINST('...' IN BOOLEAN MODE) | FULLTEXT | +/- operators, no 50% threshold | No relevance ranking by default | Precise inclusion/exclusion queries |
| MySQL query expansion | MATCH(...) AGAINST('...' WITH QUERY EXPANSION) | FULLTEXT | Finds related terms automatically | Can return noisy results | Expanding vague search queries |
| MySQL multi-column | FULLTEXT(col1, col2, col3) | FULLTEXT | Single index across columns | All columns must be same table | Cross-field search |
| Elasticsearch (comparison) | REST API + inverted index | Lucene segments | Sub-10ms on billions of docs, fuzzy, facets | Separate infrastructure, data sync | >10M docs, complex relevance tuning |
| pg_trgm word similarity | word_similarity() + <% operator | GIN/GiST | Matches substrings, better for prefix search | PostgreSQL 9.6+ only | Autocomplete, partial word matching |
START
├── Using PostgreSQL?
│ ├── YES
│ │ ├── Need typo/fuzzy tolerance?
│ │ │ ├── YES → Use pg_trgm with GIN(gin_trgm_ops) index
│ │ │ └── NO ↓
│ │ ├── Need relevance ranking with field weighting?
│ │ │ ├── YES → Use tsvector + setweight() + ts_rank()
│ │ │ └── NO ↓
│ │ ├── Simple keyword/phrase search?
│ │ │ ├── YES → Use to_tsvector() + to_tsquery() with GIN index
│ │ │ └── NO ↓
│ │ └── Need both fuzzy AND ranked search?
│ │ └── YES → Combine tsvector + pg_trgm with fallback
│ └── NO ↓
├── Using MySQL?
│ ├── YES
│ │ ├── Need precise inclusion/exclusion (+/- operators)?
│ │ │ ├── YES → Use BOOLEAN MODE
│ │ │ └── NO ↓
│ │ ├── Simple relevance-ranked search?
│ │ │ ├── YES → Use NATURAL LANGUAGE MODE
│ │ │ └── NO ↓
│ │ └── Need to find related terms from vague input?
│ │ └── YES → Use WITH QUERY EXPANSION
│ └── NO ↓
└── Dataset >10M docs OR need sub-50ms p99 OR complex faceting?
└── YES → Use dedicated search engine (Elasticsearch, Typesense, Meilisearch)
Use a generated column (PostgreSQL 12+) to automatically maintain the tsvector. This avoids triggers and ensures the search vector stays in sync with source columns. [src1]
-- Step 1a: Add generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english',
coalesce(title, '') || ' ' ||
coalesce(body, '')
)
) STORED;
-- Step 1b: Create GIN index for fast lookups
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Verify: SELECT count(*) FROM articles WHERE search_vector IS NOT NULL; → expected: should equal total row count
Use websearch_to_tsquery() (PostgreSQL 11+) for Google-style query parsing, or plainto_tsquery() for simpler input. [src2]
SELECT
id, title,
ts_rank(search_vector, query) AS rank
FROM articles,
websearch_to_tsquery('english', 'full text search postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Verify: EXPLAIN ANALYZE on the query → expected: "Bitmap Index Scan" on idx_articles_search
Use setweight() to boost title matches (weight A) over body matches (weight D). Default weights: A=1.0, B=0.4, C=0.2, D=0.1. [src2]
ALTER TABLE articles DROP COLUMN search_vector;
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'D')
) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Verify: Title matches should rank higher (~0.6+) than body-only matches (~0.1)
MySQL FULLTEXT works on VARCHAR, CHAR, and TEXT columns. InnoDB supports FULLTEXT from MySQL 5.7+. [src4]
ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);
SELECT id, title,
MATCH(title, body) AGAINST('full text search') AS relevance
FROM articles
WHERE MATCH(title, body) AGAINST('full text search')
ORDER BY relevance DESC
LIMIT 20;
Verify: EXPLAIN → expected: "fulltext" in type column
Boolean mode supports + (must include), - (must exclude), * (wildcard), and "..." (exact phrase). [src4]
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+postgresql -mysql +index*' IN BOOLEAN MODE)
ORDER BY id DESC
LIMIT 20;
Verify: Results should contain 'postgresql' and not contain 'mysql'
The pg_trgm extension provides typo-tolerant search using trigram matching. [src3]
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_articles_trgm ON articles
USING GIN (title gin_trgm_ops);
SELECT id, title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC
LIMIT 10;
Verify: SELECT similarity('postgresql', 'postgrsql'); → expected: ~0.53
-- Input: Table with title and body text columns
-- Output: Ranked search results with relevance score and snippet
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'D')
) STORED;
CREATE INDEX idx_fts ON articles USING GIN (search_vector);
SELECT id, title,
ts_rank(search_vector, q) AS rank,
ts_headline('english', body, q,
'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
) AS snippet
FROM articles,
websearch_to_tsquery('english', 'database full text search') AS q
WHERE search_vector @@ q
ORDER BY rank DESC
LIMIT 20;
-- Input: Table with FULLTEXT index on title and body
-- Output: Filtered results with relevance score
ALTER TABLE articles ADD FULLTEXT INDEX ft_idx (title, body);
SELECT id, title,
MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE)
HAVING score > 0
ORDER BY score DESC
LIMIT 20;
-- Input: User search query (may contain typos)
-- Output: Best results from exact FTS, falling back to fuzzy trigram
WITH fts_results AS (
SELECT id, title, ts_rank(search_vector, q) AS rank, 'fts' AS source
FROM articles, websearch_to_tsquery('english', $1) AS q
WHERE search_vector @@ q
ORDER BY rank DESC LIMIT 20
),
trgm_results AS (
SELECT id, title, similarity(title, $1) AS rank, 'trgm' AS source
FROM articles
WHERE title % $1
AND id NOT IN (SELECT id FROM fts_results)
ORDER BY rank DESC LIMIT 10
)
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
ORDER BY rank DESC;
# Input: User search string
# Output: Ranked search results as list of dicts
import psycopg2
def search_articles(conn, query: str, limit: int = 20) -> list[dict]:
"""Full-text search with parameterized query (SQL injection safe)."""
sql = """
SELECT id, title,
ts_rank(search_vector, websearch_to_tsquery('english', %s)) AS rank,
ts_headline('english', body, websearch_to_tsquery('english', %s),
'StartSel=<b>, StopSel=</b>, MaxWords=35') AS snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', %s)
ORDER BY rank DESC
LIMIT %s
"""
with conn.cursor() as cur:
cur.execute(sql, (query, query, query, limit))
cols = [desc[0] for desc in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
-- BAD -- full table scan on every query, no ranking, no stemming
SELECT * FROM articles
WHERE body LIKE '%database%' OR body LIKE '%search%';
-- GOOD -- uses GIN index, supports stemming and ranking
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & search');
-- BAD -- recomputes tsvector for every row on every query
SELECT *, ts_rank(to_tsvector('english', body), to_tsquery('english', 'search')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');
-- GOOD -- reads precomputed tsvector, GIN index filters rows
SELECT *, ts_rank(search_vector, to_tsquery('english', 'search')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
-- BAD -- returns zero results if common terms appear in >50% of rows
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('the best database' IN NATURAL LANGUAGE MODE);
-- GOOD -- boolean mode has no frequency threshold
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+best +database' IN BOOLEAN MODE);
-- BAD -- default threshold 0.3 returns too many false positives for short strings
SELECT * FROM products WHERE name % 'cat';
-- Returns: "catalog", "category", "caterpillar", "concatenate" ...
-- GOOD -- raise threshold for short queries to reduce noise
SET pg_trgm.similarity_threshold = 0.5;
SELECT *, similarity(name, 'cat') AS sim FROM products
WHERE name % 'cat' ORDER BY sim DESC;
-- BAD -- user types "postgresql -mysql" but plainto_tsquery treats '-' as text
SELECT * FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql -mysql');
-- Searches for 'postgresql' AND 'mysql', ignoring the minus sign
-- GOOD -- websearch_to_tsquery handles -, "quotes", and OR naturally
SELECT * FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql -mysql');
-- Correctly excludes rows containing 'mysql'
SET innodb_ft_min_token_size = 1 in my.cnf and rebuild FULLTEXT index. [src5]'simple' disables stemming; 'english' on non-English text drops valid tokens. Fix: Match the config to your content language. [src1]IN BOOLEAN MODE. [src4]REINDEX INDEX idx_name periodically. [src1]to_tsvector('english', NULL) returns NULL, excluding the row from all search results. Fix: Wrap in coalesce(column, ''). [src7]ts_headline() re-parses original text for snippets — slow on large texts for hundreds of results. Fix: Apply only to the final limited result set. [src2]-- Check if GIN index is being used (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
-- Look for: "Bitmap Index Scan on idx_articles_search"
-- Inspect tsvector content for a row (PostgreSQL)
SELECT to_tsvector('english', 'The quick brown foxes jumped') AS vector;
-- Output: 'brown':3 'fox':4 'jump':5 'quick':2
-- Check MySQL FULLTEXT index status
SHOW INDEX FROM articles WHERE Index_type = 'FULLTEXT';
-- Check pg_trgm similarity between two strings
SELECT similarity('postgresql', 'postgrsql');
-- Output: ~0.53
-- Check MySQL InnoDB FULLTEXT settings
SHOW VARIABLES LIKE 'innodb_ft%';
-- View PostgreSQL GIN index size
SELECT pg_size_pretty(pg_relation_size('idx_articles_search')) AS index_size;
| Version | Status | Key Features | Notes |
|---|---|---|---|
| PostgreSQL 17 (2024) | Current | General improvements | FTS unchanged |
| PostgreSQL 12 (2019) | Supported | Generated columns for tsvector | Major FTS ergonomic improvement |
| PostgreSQL 11 (2018) | EOL | websearch_to_tsquery() added | Google-style query parsing |
| PostgreSQL 9.6 (2016) | EOL | word_similarity() in pg_trgm | Improved substring matching |
| MySQL 8.4 (2024) | LTS | InnoDB FULLTEXT stable | Recommended version |
| MySQL 5.7 (2015) | EOL | InnoDB FULLTEXT support | Previously MyISAM only |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Dataset < 10M rows and full-text is not the primary product feature | Search is the core product (e.g., document search engine) | Elasticsearch, Typesense, or Meilisearch |
| Already using PostgreSQL/MySQL and want to avoid operational complexity | Need sub-10ms p99 latency on >100M documents | Dedicated search infrastructure |
| Need stemming, ranking, and boolean operators in PostgreSQL | Need cross-language stemming with automatic language detection | Elasticsearch with language analyzers |
| Simple search page or admin filter on an existing application | Need faceted navigation, aggregations, or typeahead at scale | Elasticsearch or Algolia |
| Need typo-tolerant search on short fields (names, titles) | Need fuzzy search on large text bodies | pg_trgm on titles + tsvector on body (hybrid) |
| MySQL app needing basic keyword search with boolean operators | Need phrase proximity search or field-level boosting in MySQL | PostgreSQL tsvector or Elasticsearch |
'english' to 'simple' changes which tokens are indexed — always rebuild indexes after changing the config