SQL Full-Text Search: PostgreSQL tsvector, MySQL FULLTEXT, and pg_trgm
How do I implement full-text search in PostgreSQL and MySQL?
TL;DR
- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.
- Key tool/command:
CREATE INDEX idx ON table USING GIN (to_tsvector('english', column)); - Watch out for: MySQL silently drops words shorter than 3 characters and words appearing in >50% of rows in NATURAL LANGUAGE MODE.
- Works with: PostgreSQL 12+ (generated columns), PostgreSQL 9.1+ (basic tsvector), MySQL 5.7+ (InnoDB FULLTEXT).
Constraints
- PostgreSQL tsvector requires specifying a text search configuration (e.g.,
'english') — using the wrong language dictionary silently drops valid stemmed tokens - MySQL InnoDB FULLTEXT minimum token size is 3 characters by default — words like "Go", "AI", "C#" are excluded unless
innodb_ft_min_token_sizeis changed (requires index rebuild) - GIN indexes are read-optimized; they slow down writes — never use GIN on a table with >10K inserts/second without
fastupdate=on(default) or consider GiST - MySQL FULLTEXT in NATURAL LANGUAGE MODE silently returns zero results for terms in >50% of rows — use BOOLEAN MODE to bypass this
- PostgreSQL
ts_rank()does NOT use the GIN index for scoring — ranking always has I/O cost proportional to result set size - pg_trgm similarity threshold defaults to 0.3 — failing to tune
pg_trgm.similarity_thresholdproduces either too many false positives or too few results
Quick Reference
| Scenario | Pattern | Index Type | Strengths | Limitations | Best For |
|---|---|---|---|---|---|
| PostgreSQL keyword search | to_tsvector() + @@ + to_tsquery() | GIN | Stemming, ranking, language-aware, boolean ops | No fuzzy/typo tolerance | Structured natural-language queries |
| PostgreSQL phrase search | to_tsvector() + @@ + phraseto_tsquery() | GIN | Word proximity, phrase matching | Exact phrase order required | "Near me" or multi-word lookups |
| PostgreSQL fuzzy search | pg_trgm % operator | GIN (gin_trgm_ops) | Typo-tolerant, LIKE/ILIKE acceleration | Higher index size, no stemming | Search-as-you-type, user input with typos |
| PostgreSQL combined | tsvector + pg_trgm fallback | GIN (both) | Best coverage: stemmed + fuzzy | Two indexes, more storage | Production search with autocomplete |
| PostgreSQL weighted search | setweight() + ts_rank() | GIN | Title > body boosting (A/B/C/D weights) | Ranking not index-accelerated | Relevance-ranked results |
| PostgreSQL generated column | GENERATED ALWAYS AS (to_tsvector(...)) STORED | GIN | No trigger needed, auto-maintained | PostgreSQL 12+ only | Clean, maintainable tsvector storage |
| MySQL natural language | MATCH(...) AGAINST('...') | FULLTEXT | Simple setup, built-in relevance | 50% threshold, 3-char minimum | Basic search on InnoDB tables |
| MySQL boolean mode | MATCH(...) AGAINST('...' IN BOOLEAN MODE) | FULLTEXT | +/- operators, no 50% threshold | No relevance ranking by default | Precise inclusion/exclusion queries |
| MySQL query expansion | MATCH(...) AGAINST('...' WITH QUERY EXPANSION) | FULLTEXT | Finds related terms automatically | Can return noisy results | Expanding vague search queries |
| MySQL multi-column | FULLTEXT(col1, col2, col3) | FULLTEXT | Single index across columns | All columns must be same table | Cross-field search |
| Elasticsearch (comparison) | REST API + inverted index | Lucene segments | Sub-10ms on billions of docs, fuzzy, facets | Separate infrastructure, data sync | >10M docs, complex relevance tuning |
| pg_trgm word similarity | word_similarity() + <% operator | GIN/GiST | Matches substrings, better for prefix search | PostgreSQL 9.6+ only | Autocomplete, partial word matching |
Decision Tree
START
├── Using PostgreSQL?
│ ├── YES
│ │ ├── Need typo/fuzzy tolerance?
│ │ │ ├── YES → Use pg_trgm with GIN(gin_trgm_ops) index
│ │ │ └── NO ↓
│ │ ├── Need relevance ranking with field weighting?
│ │ │ ├── YES → Use tsvector + setweight() + ts_rank()
│ │ │ └── NO ↓
│ │ ├── Simple keyword/phrase search?
│ │ │ ├── YES → Use to_tsvector() + to_tsquery() with GIN index
│ │ │ └── NO ↓
│ │ └── Need both fuzzy AND ranked search?
│ │ └── YES → Combine tsvector + pg_trgm with fallback
│ └── NO ↓
├── Using MySQL?
│ ├── YES
│ │ ├── Need precise inclusion/exclusion (+/- operators)?
│ │ │ ├── YES → Use BOOLEAN MODE
│ │ │ └── NO ↓
│ │ ├── Simple relevance-ranked search?
│ │ │ ├── YES → Use NATURAL LANGUAGE MODE
│ │ │ └── NO ↓
│ │ └── Need to find related terms from vague input?
│ │ └── YES → Use WITH QUERY EXPANSION
│ └── NO ↓
└── Dataset >10M docs OR need sub-50ms p99 OR complex faceting?
└── YES → Use dedicated search engine (Elasticsearch, Typesense, Meilisearch)
Step-by-Step Guide
1. Create a tsvector column with GIN index (PostgreSQL)
Use a generated column (PostgreSQL 12+) to automatically maintain the tsvector. This avoids triggers and ensures the search vector stays in sync with source columns. [src1]
-- Step 1a: Add generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english',
coalesce(title, '') || ' ' ||
coalesce(body, '')
)
) STORED;
-- Step 1b: Create GIN index for fast lookups
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Verify: SELECT count(*) FROM articles WHERE search_vector IS NOT NULL; → expected: should equal total row count
2. Query with tsquery and rank results (PostgreSQL)
Use websearch_to_tsquery() (PostgreSQL 11+) for Google-style query parsing, or plainto_tsquery() for simpler input. [src2]
SELECT
id, title,
ts_rank(search_vector, query) AS rank
FROM articles,
websearch_to_tsquery('english', 'full text search postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
Verify: EXPLAIN ANALYZE on the query → expected: "Bitmap Index Scan" on idx_articles_search
3. Add field weighting for relevance (PostgreSQL)
Use setweight() to boost title matches (weight A) over body matches (weight D). Default weights: A=1.0, B=0.4, C=0.2, D=0.1. [src2]
ALTER TABLE articles DROP COLUMN search_vector;
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'D')
) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Verify: Title matches should rank higher (~0.6+) than body-only matches (~0.1)
4. Create a FULLTEXT index and query (MySQL)
MySQL FULLTEXT works on VARCHAR, CHAR, and TEXT columns. InnoDB supports FULLTEXT from MySQL 5.7+. [src4]
ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);
SELECT id, title,
MATCH(title, body) AGAINST('full text search') AS relevance
FROM articles
WHERE MATCH(title, body) AGAINST('full text search')
ORDER BY relevance DESC
LIMIT 20;
Verify: EXPLAIN → expected: "fulltext" in type column
5. Use boolean mode for precise control (MySQL)
Boolean mode supports + (must include), - (must exclude), * (wildcard), and "..." (exact phrase). [src4]
SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+postgresql -mysql +index*' IN BOOLEAN MODE)
ORDER BY id DESC
LIMIT 20;
Verify: Results should contain 'postgresql' and not contain 'mysql'
6. Enable pg_trgm for fuzzy search (PostgreSQL)
The pg_trgm extension provides typo-tolerant search using trigram matching. [src3]
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_articles_trgm ON articles
USING GIN (title gin_trgm_ops);
SELECT id, title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC
LIMIT 10;
Verify: SELECT similarity('postgresql', 'postgrsql'); → expected: ~0.53
Code Examples
PostgreSQL: Weighted multi-column search with ranking
-- Input: Table with title and body text columns
-- Output: Ranked search results with relevance score and snippet
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'D')
) STORED;
CREATE INDEX idx_fts ON articles USING GIN (search_vector);
SELECT id, title,
ts_rank(search_vector, q) AS rank,
ts_headline('english', body, q,
'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
) AS snippet
FROM articles,
websearch_to_tsquery('english', 'database full text search') AS q
WHERE search_vector @@ q
ORDER BY rank DESC
LIMIT 20;
MySQL: Boolean mode with relevance sorting
-- Input: Table with FULLTEXT index on title and body
-- Output: Filtered results with relevance score
ALTER TABLE articles ADD FULLTEXT INDEX ft_idx (title, body);
SELECT id, title,
MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE)
HAVING score > 0
ORDER BY score DESC
LIMIT 20;
PostgreSQL: Combined tsvector + pg_trgm fallback
-- Input: User search query (may contain typos)
-- Output: Best results from exact FTS, falling back to fuzzy trigram
WITH fts_results AS (
SELECT id, title, ts_rank(search_vector, q) AS rank, 'fts' AS source
FROM articles, websearch_to_tsquery('english', $1) AS q
WHERE search_vector @@ q
ORDER BY rank DESC LIMIT 20
),
trgm_results AS (
SELECT id, title, similarity(title, $1) AS rank, 'trgm' AS source
FROM articles
WHERE title % $1
AND id NOT IN (SELECT id FROM fts_results)
ORDER BY rank DESC LIMIT 10
)
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
ORDER BY rank DESC;
Python (psycopg2): Parameterized full-text search
# Input: User search string
# Output: Ranked search results as list of dicts
import psycopg2
def search_articles(conn, query: str, limit: int = 20) -> list[dict]:
"""Full-text search with parameterized query (SQL injection safe)."""
sql = """
SELECT id, title,
ts_rank(search_vector, websearch_to_tsquery('english', %s)) AS rank,
ts_headline('english', body, websearch_to_tsquery('english', %s),
'StartSel=<b>, StopSel=</b>, MaxWords=35') AS snippet
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', %s)
ORDER BY rank DESC
LIMIT %s
"""
with conn.cursor() as cur:
cur.execute(sql, (query, query, query, limit))
cols = [desc[0] for desc in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
Anti-Patterns
Wrong: Using LIKE '%term%' for text search
-- BAD -- full table scan on every query, no ranking, no stemming
SELECT * FROM articles
WHERE body LIKE '%database%' OR body LIKE '%search%';
Correct: Use tsvector with GIN index
-- GOOD -- uses GIN index, supports stemming and ranking
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & search');
Wrong: Computing tsvector inline on every query
-- BAD -- recomputes tsvector for every row on every query
SELECT *, ts_rank(to_tsvector('english', body), to_tsquery('english', 'search')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');
Correct: Use a stored tsvector column with index
-- GOOD -- reads precomputed tsvector, GIN index filters rows
SELECT *, ts_rank(search_vector, to_tsquery('english', 'search')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
Wrong: MySQL NATURAL LANGUAGE MODE for high-frequency terms
-- BAD -- returns zero results if common terms appear in >50% of rows
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('the best database' IN NATURAL LANGUAGE MODE);
Correct: Use BOOLEAN MODE to bypass the 50% threshold
-- GOOD -- boolean mode has no frequency threshold
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+best +database' IN BOOLEAN MODE);
Wrong: Using pg_trgm without setting similarity threshold
-- BAD -- default threshold 0.3 returns too many false positives for short strings
SELECT * FROM products WHERE name % 'cat';
-- Returns: "catalog", "category", "caterpillar", "concatenate" ...
Correct: Tune the similarity threshold before querying
-- GOOD -- raise threshold for short queries to reduce noise
SET pg_trgm.similarity_threshold = 0.5;
SELECT *, similarity(name, 'cat') AS sim FROM products
WHERE name % 'cat' ORDER BY sim DESC;
Wrong: Using plainto_tsquery for user input with operators
-- BAD -- user types "postgresql -mysql" but plainto_tsquery treats '-' as text
SELECT * FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql -mysql');
-- Searches for 'postgresql' AND 'mysql', ignoring the minus sign
Correct: Use websearch_to_tsquery for Google-style input
-- GOOD -- websearch_to_tsquery handles -, "quotes", and OR naturally
SELECT * FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql -mysql');
-- Correctly excludes rows containing 'mysql'
Common Pitfalls
- MySQL 3-character minimum: Words shorter than 3 characters (InnoDB) or 4 characters (MyISAM) are silently dropped. Fix:
SET innodb_ft_min_token_size = 1in my.cnf and rebuild FULLTEXT index. [src5] - PostgreSQL wrong language config: Using
'simple'disables stemming;'english'on non-English text drops valid tokens. Fix: Match the config to your content language. [src1] - MySQL 50% threshold: NATURAL LANGUAGE MODE returns zero results for terms in >50% of rows. Fix: Use
IN BOOLEAN MODE. [src4] - GIN index bloat after bulk deletes: GIN indexes accumulate dead entries not cleaned by regular VACUUM. Fix: Run
REINDEX INDEX idx_nameperiodically. [src1] - Missing COALESCE on nullable columns:
to_tsvector('english', NULL)returns NULL, excluding the row from all search results. Fix: Wrap incoalesce(column, ''). [src7] - ts_headline performance trap:
ts_headline()re-parses original text for snippets — slow on large texts for hundreds of results. Fix: Apply only to the final limited result set. [src2] - MySQL FULLTEXT on JOINs: FULLTEXT index may not be used if optimizer chooses a different path. Fix: Isolate FULLTEXT search in a subquery. [src4]
- pg_trgm index size: Trigram GIN indexes are 2-5x larger than tsvector GIN indexes. Fix: Create trigram indexes only on short columns (title), not large body columns. [src3]
Diagnostic Commands
-- Check if GIN index is being used (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
-- Look for: "Bitmap Index Scan on idx_articles_search"
-- Inspect tsvector content for a row (PostgreSQL)
SELECT to_tsvector('english', 'The quick brown foxes jumped') AS vector;
-- Output: 'brown':3 'fox':4 'jump':5 'quick':2
-- Check MySQL FULLTEXT index status
SHOW INDEX FROM articles WHERE Index_type = 'FULLTEXT';
-- Check pg_trgm similarity between two strings
SELECT similarity('postgresql', 'postgrsql');
-- Output: ~0.53
-- Check MySQL InnoDB FULLTEXT settings
SHOW VARIABLES LIKE 'innodb_ft%';
-- View PostgreSQL GIN index size
SELECT pg_size_pretty(pg_relation_size('idx_articles_search')) AS index_size;
Version History & Compatibility
| Version | Status | Key Features | Notes |
|---|---|---|---|
| PostgreSQL 17 (2024) | Current | General improvements | FTS unchanged |
| PostgreSQL 12 (2019) | Supported | Generated columns for tsvector | Major FTS ergonomic improvement |
| PostgreSQL 11 (2018) | EOL | websearch_to_tsquery() added | Google-style query parsing |
| PostgreSQL 9.6 (2016) | EOL | word_similarity() in pg_trgm | Improved substring matching |
| MySQL 8.4 (2024) | LTS | InnoDB FULLTEXT stable | Recommended version |
| MySQL 5.7 (2015) | EOL | InnoDB FULLTEXT support | Previously MyISAM only |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Dataset < 10M rows and full-text is not the primary product feature | Search is the core product (e.g., document search engine) | Elasticsearch, Typesense, or Meilisearch |
| Already using PostgreSQL/MySQL and want to avoid operational complexity | Need sub-10ms p99 latency on >100M documents | Dedicated search infrastructure |
| Need stemming, ranking, and boolean operators in PostgreSQL | Need cross-language stemming with automatic language detection | Elasticsearch with language analyzers |
| Simple search page or admin filter on an existing application | Need faceted navigation, aggregations, or typeahead at scale | Elasticsearch or Algolia |
| Need typo-tolerant search on short fields (names, titles) | Need fuzzy search on large text bodies | pg_trgm on titles + tsvector on body (hybrid) |
| MySQL app needing basic keyword search with boolean operators | Need phrase proximity search or field-level boosting in MySQL | PostgreSQL tsvector or Elasticsearch |
Important Caveats
- PostgreSQL tsvector is language-dependent: switching from
'english'to'simple'changes which tokens are indexed — always rebuild indexes after changing the config - MySQL FULLTEXT indexes cannot span multiple tables — composite search across joined tables requires denormalization or a subquery pattern
- GIN index updates use a "fastupdate" pending list by default — under heavy write load, the pending list can grow large and cause occasional slow searches
- pg_trgm is not a replacement for tsvector — trigrams do not understand word boundaries, stemming, or stop words; use tsvector for natural language search and pg_trgm for fuzzy matching
- MySQL FULLTEXT relevance scores are not comparable across different queries — scores are query-specific and should only be used for ordering within a single result set