PostgreSQL full-text search tsvector tsquery

- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.

MySQL FULLTEXT index search

- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.

pg_trgm trigram fuzzy search

- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.

SQL full-text search GIN index

- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.

PostgreSQL vs MySQL text search comparison

- Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.

SQL Full-Text Search: PostgreSQL tsvector, MySQL FULLTEXT, and pg_trgm

How do I implement full-text search in PostgreSQL and MySQL?

TL;DR

Bottom line: PostgreSQL and MySQL both offer built-in full-text search that eliminates the need for Elasticsearch in small-to-medium workloads; PostgreSQL's tsvector/tsquery with GIN indexes is more powerful, while MySQL's FULLTEXT is simpler to set up.
Key tool/command: CREATE INDEX idx ON table USING GIN (to_tsvector('english', column));
Watch out for: MySQL silently drops words shorter than 3 characters and words appearing in >50% of rows in NATURAL LANGUAGE MODE.
Works with: PostgreSQL 12+ (generated columns), PostgreSQL 9.1+ (basic tsvector), MySQL 5.7+ (InnoDB FULLTEXT).

Constraints

PostgreSQL tsvector requires specifying a text search configuration (e.g., 'english') — using the wrong language dictionary silently drops valid stemmed tokens
MySQL InnoDB FULLTEXT minimum token size is 3 characters by default — words like "Go", "AI", "C#" are excluded unless innodb_ft_min_token_size is changed (requires index rebuild)
GIN indexes are read-optimized; they slow down writes — never use GIN on a table with >10K inserts/second without fastupdate=on (default) or consider GiST
MySQL FULLTEXT in NATURAL LANGUAGE MODE silently returns zero results for terms in >50% of rows — use BOOLEAN MODE to bypass this
PostgreSQL ts_rank() does NOT use the GIN index for scoring — ranking always has I/O cost proportional to result set size
pg_trgm similarity threshold defaults to 0.3 — failing to tune pg_trgm.similarity_threshold produces either too many false positives or too few results

Quick Reference

Scenario	Pattern	Index Type	Strengths	Limitations	Best For
PostgreSQL keyword search	`to_tsvector()` + `@@` + `to_tsquery()`	GIN	Stemming, ranking, language-aware, boolean ops	No fuzzy/typo tolerance	Structured natural-language queries
PostgreSQL phrase search	`to_tsvector()` + `@@` + `phraseto_tsquery()`	GIN	Word proximity, phrase matching	Exact phrase order required	"Near me" or multi-word lookups
PostgreSQL fuzzy search	`pg_trgm` `%` operator	GIN (gin_trgm_ops)	Typo-tolerant, LIKE/ILIKE acceleration	Higher index size, no stemming	Search-as-you-type, user input with typos
PostgreSQL combined	tsvector + pg_trgm fallback	GIN (both)	Best coverage: stemmed + fuzzy	Two indexes, more storage	Production search with autocomplete
PostgreSQL weighted search	`setweight()` + `ts_rank()`	GIN	Title > body boosting (A/B/C/D weights)	Ranking not index-accelerated	Relevance-ranked results
PostgreSQL generated column	`GENERATED ALWAYS AS (to_tsvector(...)) STORED`	GIN	No trigger needed, auto-maintained	PostgreSQL 12+ only	Clean, maintainable tsvector storage
MySQL natural language	`MATCH(...) AGAINST('...')`	FULLTEXT	Simple setup, built-in relevance	50% threshold, 3-char minimum	Basic search on InnoDB tables
MySQL boolean mode	`MATCH(...) AGAINST('...' IN BOOLEAN MODE)`	FULLTEXT	+/- operators, no 50% threshold	No relevance ranking by default	Precise inclusion/exclusion queries
MySQL query expansion	`MATCH(...) AGAINST('...' WITH QUERY EXPANSION)`	FULLTEXT	Finds related terms automatically	Can return noisy results	Expanding vague search queries
MySQL multi-column	`FULLTEXT(col1, col2, col3)`	FULLTEXT	Single index across columns	All columns must be same table	Cross-field search
Elasticsearch (comparison)	REST API + inverted index	Lucene segments	Sub-10ms on billions of docs, fuzzy, facets	Separate infrastructure, data sync	>10M docs, complex relevance tuning
pg_trgm word similarity	`word_similarity()` + `<%` operator	GIN/GiST	Matches substrings, better for prefix search	PostgreSQL 9.6+ only	Autocomplete, partial word matching

Decision Tree

START
├── Using PostgreSQL?
│   ├── YES
│   │   ├── Need typo/fuzzy tolerance?
│   │   │   ├── YES → Use pg_trgm with GIN(gin_trgm_ops) index
│   │   │   └── NO ↓
│   │   ├── Need relevance ranking with field weighting?
│   │   │   ├── YES → Use tsvector + setweight() + ts_rank()
│   │   │   └── NO ↓
│   │   ├── Simple keyword/phrase search?
│   │   │   ├── YES → Use to_tsvector() + to_tsquery() with GIN index
│   │   │   └── NO ↓
│   │   └── Need both fuzzy AND ranked search?
│   │       └── YES → Combine tsvector + pg_trgm with fallback
│   └── NO ↓
├── Using MySQL?
│   ├── YES
│   │   ├── Need precise inclusion/exclusion (+/- operators)?
│   │   │   ├── YES → Use BOOLEAN MODE
│   │   │   └── NO ↓
│   │   ├── Simple relevance-ranked search?
│   │   │   ├── YES → Use NATURAL LANGUAGE MODE
│   │   │   └── NO ↓
│   │   └── Need to find related terms from vague input?
│   │       └── YES → Use WITH QUERY EXPANSION
│   └── NO ↓
└── Dataset >10M docs OR need sub-50ms p99 OR complex faceting?
    └── YES → Use dedicated search engine (Elasticsearch, Typesense, Meilisearch)

Step-by-Step Guide

1. Create a tsvector column with GIN index (PostgreSQL)

Use a generated column (PostgreSQL 12+) to automatically maintain the tsvector. This avoids triggers and ensures the search vector stays in sync with source columns. [src1]

-- Step 1a: Add generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english',
      coalesce(title, '') || ' ' ||
      coalesce(body, '')
    )
  ) STORED;

-- Step 1b: Create GIN index for fast lookups
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Verify: SELECT count(*) FROM articles WHERE search_vector IS NOT NULL; → expected: should equal total row count

2. Query with tsquery and rank results (PostgreSQL)

Use websearch_to_tsquery() (PostgreSQL 11+) for Google-style query parsing, or plainto_tsquery() for simpler input. [src2]

SELECT
  id, title,
  ts_rank(search_vector, query) AS rank
FROM articles,
  websearch_to_tsquery('english', 'full text search postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Verify: EXPLAIN ANALYZE on the query → expected: "Bitmap Index Scan" on idx_articles_search

3. Add field weighting for relevance (PostgreSQL)

Use setweight() to boost title matches (weight A) over body matches (weight D). Default weights: A=1.0, B=0.4, C=0.2, D=0.1. [src2]

ALTER TABLE articles DROP COLUMN search_vector;
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'D')
  ) STORED;

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Verify: Title matches should rank higher (~0.6+) than body-only matches (~0.1)

4. Create a FULLTEXT index and query (MySQL)

MySQL FULLTEXT works on VARCHAR, CHAR, and TEXT columns. InnoDB supports FULLTEXT from MySQL 5.7+. [src4]

ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);

SELECT id, title,
  MATCH(title, body) AGAINST('full text search') AS relevance
FROM articles
WHERE MATCH(title, body) AGAINST('full text search')
ORDER BY relevance DESC
LIMIT 20;

Verify: EXPLAIN → expected: "fulltext" in type column

5. Use boolean mode for precise control (MySQL)

Boolean mode supports + (must include), - (must exclude), * (wildcard), and "..." (exact phrase). [src4]

SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+postgresql -mysql +index*' IN BOOLEAN MODE)
ORDER BY id DESC
LIMIT 20;

Verify: Results should contain 'postgresql' and not contain 'mysql'

6. Enable pg_trgm for fuzzy search (PostgreSQL)

The pg_trgm extension provides typo-tolerant search using trigram matching. [src3]

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_articles_trgm ON articles
  USING GIN (title gin_trgm_ops);

SELECT id, title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC
LIMIT 10;

Verify: SELECT similarity('postgresql', 'postgrsql'); → expected: ~0.53

Code Examples

PostgreSQL: Weighted multi-column search with ranking

-- Input:  Table with title and body text columns
-- Output: Ranked search results with relevance score and snippet

ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'D')
  ) STORED;

CREATE INDEX idx_fts ON articles USING GIN (search_vector);

SELECT id, title,
  ts_rank(search_vector, q) AS rank,
  ts_headline('english', body, q,
    'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
  ) AS snippet
FROM articles,
  websearch_to_tsquery('english', 'database full text search') AS q
WHERE search_vector @@ q
ORDER BY rank DESC
LIMIT 20;

MySQL: Boolean mode with relevance sorting

-- Input:  Table with FULLTEXT index on title and body
-- Output: Filtered results with relevance score

ALTER TABLE articles ADD FULLTEXT INDEX ft_idx (title, body);

SELECT id, title,
  MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE)
HAVING score > 0
ORDER BY score DESC
LIMIT 20;

PostgreSQL: Combined tsvector + pg_trgm fallback

-- Input:  User search query (may contain typos)
-- Output: Best results from exact FTS, falling back to fuzzy trigram

WITH fts_results AS (
  SELECT id, title, ts_rank(search_vector, q) AS rank, 'fts' AS source
  FROM articles, websearch_to_tsquery('english', $1) AS q
  WHERE search_vector @@ q
  ORDER BY rank DESC LIMIT 20
),
trgm_results AS (
  SELECT id, title, similarity(title, $1) AS rank, 'trgm' AS source
  FROM articles
  WHERE title % $1
    AND id NOT IN (SELECT id FROM fts_results)
  ORDER BY rank DESC LIMIT 10
)
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
ORDER BY rank DESC;

Python (psycopg2): Parameterized full-text search

# Input:  User search string
# Output: Ranked search results as list of dicts

import psycopg2

def search_articles(conn, query: str, limit: int = 20) -> list[dict]:
    """Full-text search with parameterized query (SQL injection safe)."""
    sql = """
        SELECT id, title,
          ts_rank(search_vector, websearch_to_tsquery('english', %s)) AS rank,
          ts_headline('english', body, websearch_to_tsquery('english', %s),
            'StartSel=<b>, StopSel=</b>, MaxWords=35') AS snippet
        FROM articles
        WHERE search_vector @@ websearch_to_tsquery('english', %s)
        ORDER BY rank DESC
        LIMIT %s
    """
    with conn.cursor() as cur:
        cur.execute(sql, (query, query, query, limit))
        cols = [desc[0] for desc in cur.description]
        return [dict(zip(cols, row)) for row in cur.fetchall()]

Anti-Patterns

Wrong: Using LIKE '%term%' for text search

-- BAD -- full table scan on every query, no ranking, no stemming
SELECT * FROM articles
WHERE body LIKE '%database%' OR body LIKE '%search%';

Correct: Use tsvector with GIN index

-- GOOD -- uses GIN index, supports stemming and ranking
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & search');

Wrong: Computing tsvector inline on every query

-- BAD -- recomputes tsvector for every row on every query
SELECT *, ts_rank(to_tsvector('english', body), to_tsquery('english', 'search')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');

Correct: Use a stored tsvector column with index

-- GOOD -- reads precomputed tsvector, GIN index filters rows
SELECT *, ts_rank(search_vector, to_tsquery('english', 'search')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');

Wrong: MySQL NATURAL LANGUAGE MODE for high-frequency terms

-- BAD -- returns zero results if common terms appear in >50% of rows
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('the best database' IN NATURAL LANGUAGE MODE);

Correct: Use BOOLEAN MODE to bypass the 50% threshold

-- GOOD -- boolean mode has no frequency threshold
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+best +database' IN BOOLEAN MODE);

Wrong: Using pg_trgm without setting similarity threshold

-- BAD -- default threshold 0.3 returns too many false positives for short strings
SELECT * FROM products WHERE name % 'cat';
-- Returns: "catalog", "category", "caterpillar", "concatenate" ...

Correct: Tune the similarity threshold before querying

-- GOOD -- raise threshold for short queries to reduce noise
SET pg_trgm.similarity_threshold = 0.5;
SELECT *, similarity(name, 'cat') AS sim FROM products
WHERE name % 'cat' ORDER BY sim DESC;

Wrong: Using plainto_tsquery for user input with operators

-- BAD -- user types "postgresql -mysql" but plainto_tsquery treats '-' as text
SELECT * FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql -mysql');
-- Searches for 'postgresql' AND 'mysql', ignoring the minus sign

Correct: Use websearch_to_tsquery for Google-style input

-- GOOD -- websearch_to_tsquery handles -, "quotes", and OR naturally
SELECT * FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql -mysql');
-- Correctly excludes rows containing 'mysql'

Common Pitfalls

MySQL 3-character minimum: Words shorter than 3 characters (InnoDB) or 4 characters (MyISAM) are silently dropped. Fix: SET innodb_ft_min_token_size = 1 in my.cnf and rebuild FULLTEXT index. [src5]
PostgreSQL wrong language config: Using 'simple' disables stemming; 'english' on non-English text drops valid tokens. Fix: Match the config to your content language. [src1]
MySQL 50% threshold: NATURAL LANGUAGE MODE returns zero results for terms in >50% of rows. Fix: Use IN BOOLEAN MODE. [src4]
GIN index bloat after bulk deletes: GIN indexes accumulate dead entries not cleaned by regular VACUUM. Fix: Run REINDEX INDEX idx_name periodically. [src1]
Missing COALESCE on nullable columns: to_tsvector('english', NULL) returns NULL, excluding the row from all search results. Fix: Wrap in coalesce(column, ''). [src7]
ts_headline performance trap: ts_headline() re-parses original text for snippets — slow on large texts for hundreds of results. Fix: Apply only to the final limited result set. [src2]
MySQL FULLTEXT on JOINs: FULLTEXT index may not be used if optimizer chooses a different path. Fix: Isolate FULLTEXT search in a subquery. [src4]
pg_trgm index size: Trigram GIN indexes are 2-5x larger than tsvector GIN indexes. Fix: Create trigram indexes only on short columns (title), not large body columns. [src3]

Diagnostic Commands

-- Check if GIN index is being used (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
-- Look for: "Bitmap Index Scan on idx_articles_search"

-- Inspect tsvector content for a row (PostgreSQL)
SELECT to_tsvector('english', 'The quick brown foxes jumped') AS vector;
-- Output: 'brown':3 'fox':4 'jump':5 'quick':2

-- Check MySQL FULLTEXT index status
SHOW INDEX FROM articles WHERE Index_type = 'FULLTEXT';

-- Check pg_trgm similarity between two strings
SELECT similarity('postgresql', 'postgrsql');
-- Output: ~0.53

-- Check MySQL InnoDB FULLTEXT settings
SHOW VARIABLES LIKE 'innodb_ft%';

-- View PostgreSQL GIN index size
SELECT pg_size_pretty(pg_relation_size('idx_articles_search')) AS index_size;

Version History & Compatibility

Version	Status	Key Features	Notes
PostgreSQL 17 (2024)	Current	General improvements	FTS unchanged
PostgreSQL 12 (2019)	Supported	Generated columns for tsvector	Major FTS ergonomic improvement
PostgreSQL 11 (2018)	EOL	`websearch_to_tsquery()` added	Google-style query parsing
PostgreSQL 9.6 (2016)	EOL	`word_similarity()` in pg_trgm	Improved substring matching
MySQL 8.4 (2024)	LTS	InnoDB FULLTEXT stable	Recommended version
MySQL 5.7 (2015)	EOL	InnoDB FULLTEXT support	Previously MyISAM only

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Dataset < 10M rows and full-text is not the primary product feature	Search is the core product (e.g., document search engine)	Elasticsearch, Typesense, or Meilisearch
Already using PostgreSQL/MySQL and want to avoid operational complexity	Need sub-10ms p99 latency on >100M documents	Dedicated search infrastructure
Need stemming, ranking, and boolean operators in PostgreSQL	Need cross-language stemming with automatic language detection	Elasticsearch with language analyzers
Simple search page or admin filter on an existing application	Need faceted navigation, aggregations, or typeahead at scale	Elasticsearch or Algolia
Need typo-tolerant search on short fields (names, titles)	Need fuzzy search on large text bodies	pg_trgm on titles + tsvector on body (hybrid)
MySQL app needing basic keyword search with boolean operators	Need phrase proximity search or field-level boosting in MySQL	PostgreSQL tsvector or Elasticsearch

Important Caveats

PostgreSQL tsvector is language-dependent: switching from 'english' to 'simple' changes which tokens are indexed — always rebuild indexes after changing the config
MySQL FULLTEXT indexes cannot span multiple tables — composite search across joined tables requires denormalization or a subquery pattern
GIN index updates use a "fastupdate" pending list by default — under heavy write load, the pending list can grow large and cause occasional slow searches
pg_trgm is not a replacement for tsvector — trigrams do not understand word boundaries, stemming, or stop words; use tsvector for natural language search and pg_trgm for fuzzy matching
MySQL FULLTEXT relevance scores are not comparable across different queries — scores are query-specific and should only be used for ordering within a single result set