SQL Full-Text Search: PostgreSQL tsvector, MySQL FULLTEXT, and pg_trgm

Type: Software Reference Confidence: 0.94 Sources: 7 Verified: 2026-02-23 Freshness: 2026-02-23

TL;DR

Constraints

Quick Reference

ScenarioPatternIndex TypeStrengthsLimitationsBest For
PostgreSQL keyword searchto_tsvector() + @@ + to_tsquery()GINStemming, ranking, language-aware, boolean opsNo fuzzy/typo toleranceStructured natural-language queries
PostgreSQL phrase searchto_tsvector() + @@ + phraseto_tsquery()GINWord proximity, phrase matchingExact phrase order required"Near me" or multi-word lookups
PostgreSQL fuzzy searchpg_trgm % operatorGIN (gin_trgm_ops)Typo-tolerant, LIKE/ILIKE accelerationHigher index size, no stemmingSearch-as-you-type, user input with typos
PostgreSQL combinedtsvector + pg_trgm fallbackGIN (both)Best coverage: stemmed + fuzzyTwo indexes, more storageProduction search with autocomplete
PostgreSQL weighted searchsetweight() + ts_rank()GINTitle > body boosting (A/B/C/D weights)Ranking not index-acceleratedRelevance-ranked results
PostgreSQL generated columnGENERATED ALWAYS AS (to_tsvector(...)) STOREDGINNo trigger needed, auto-maintainedPostgreSQL 12+ onlyClean, maintainable tsvector storage
MySQL natural languageMATCH(...) AGAINST('...')FULLTEXTSimple setup, built-in relevance50% threshold, 3-char minimumBasic search on InnoDB tables
MySQL boolean modeMATCH(...) AGAINST('...' IN BOOLEAN MODE)FULLTEXT+/- operators, no 50% thresholdNo relevance ranking by defaultPrecise inclusion/exclusion queries
MySQL query expansionMATCH(...) AGAINST('...' WITH QUERY EXPANSION)FULLTEXTFinds related terms automaticallyCan return noisy resultsExpanding vague search queries
MySQL multi-columnFULLTEXT(col1, col2, col3)FULLTEXTSingle index across columnsAll columns must be same tableCross-field search
Elasticsearch (comparison)REST API + inverted indexLucene segmentsSub-10ms on billions of docs, fuzzy, facetsSeparate infrastructure, data sync>10M docs, complex relevance tuning
pg_trgm word similarityword_similarity() + <% operatorGIN/GiSTMatches substrings, better for prefix searchPostgreSQL 9.6+ onlyAutocomplete, partial word matching

Decision Tree

START
├── Using PostgreSQL?
│   ├── YES
│   │   ├── Need typo/fuzzy tolerance?
│   │   │   ├── YES → Use pg_trgm with GIN(gin_trgm_ops) index
│   │   │   └── NO ↓
│   │   ├── Need relevance ranking with field weighting?
│   │   │   ├── YES → Use tsvector + setweight() + ts_rank()
│   │   │   └── NO ↓
│   │   ├── Simple keyword/phrase search?
│   │   │   ├── YES → Use to_tsvector() + to_tsquery() with GIN index
│   │   │   └── NO ↓
│   │   └── Need both fuzzy AND ranked search?
│   │       └── YES → Combine tsvector + pg_trgm with fallback
│   └── NO ↓
├── Using MySQL?
│   ├── YES
│   │   ├── Need precise inclusion/exclusion (+/- operators)?
│   │   │   ├── YES → Use BOOLEAN MODE
│   │   │   └── NO ↓
│   │   ├── Simple relevance-ranked search?
│   │   │   ├── YES → Use NATURAL LANGUAGE MODE
│   │   │   └── NO ↓
│   │   └── Need to find related terms from vague input?
│   │       └── YES → Use WITH QUERY EXPANSION
│   └── NO ↓
└── Dataset >10M docs OR need sub-50ms p99 OR complex faceting?
    └── YES → Use dedicated search engine (Elasticsearch, Typesense, Meilisearch)

Step-by-Step Guide

1. Create a tsvector column with GIN index (PostgreSQL)

Use a generated column (PostgreSQL 12+) to automatically maintain the tsvector. This avoids triggers and ensures the search vector stays in sync with source columns. [src1]

-- Step 1a: Add generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english',
      coalesce(title, '') || ' ' ||
      coalesce(body, '')
    )
  ) STORED;

-- Step 1b: Create GIN index for fast lookups
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Verify: SELECT count(*) FROM articles WHERE search_vector IS NOT NULL; → expected: should equal total row count

2. Query with tsquery and rank results (PostgreSQL)

Use websearch_to_tsquery() (PostgreSQL 11+) for Google-style query parsing, or plainto_tsquery() for simpler input. [src2]

SELECT
  id, title,
  ts_rank(search_vector, query) AS rank
FROM articles,
  websearch_to_tsquery('english', 'full text search postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Verify: EXPLAIN ANALYZE on the query → expected: "Bitmap Index Scan" on idx_articles_search

3. Add field weighting for relevance (PostgreSQL)

Use setweight() to boost title matches (weight A) over body matches (weight D). Default weights: A=1.0, B=0.4, C=0.2, D=0.1. [src2]

ALTER TABLE articles DROP COLUMN search_vector;
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'D')
  ) STORED;

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Verify: Title matches should rank higher (~0.6+) than body-only matches (~0.1)

4. Create a FULLTEXT index and query (MySQL)

MySQL FULLTEXT works on VARCHAR, CHAR, and TEXT columns. InnoDB supports FULLTEXT from MySQL 5.7+. [src4]

ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);

SELECT id, title,
  MATCH(title, body) AGAINST('full text search') AS relevance
FROM articles
WHERE MATCH(title, body) AGAINST('full text search')
ORDER BY relevance DESC
LIMIT 20;

Verify: EXPLAIN → expected: "fulltext" in type column

5. Use boolean mode for precise control (MySQL)

Boolean mode supports + (must include), - (must exclude), * (wildcard), and "..." (exact phrase). [src4]

SELECT id, title
FROM articles
WHERE MATCH(title, body) AGAINST('+postgresql -mysql +index*' IN BOOLEAN MODE)
ORDER BY id DESC
LIMIT 20;

Verify: Results should contain 'postgresql' and not contain 'mysql'

6. Enable pg_trgm for fuzzy search (PostgreSQL)

The pg_trgm extension provides typo-tolerant search using trigram matching. [src3]

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_articles_trgm ON articles
  USING GIN (title gin_trgm_ops);

SELECT id, title, similarity(title, 'postgrsql') AS sim
FROM articles
WHERE title % 'postgrsql'
ORDER BY sim DESC
LIMIT 10;

Verify: SELECT similarity('postgresql', 'postgrsql'); → expected: ~0.53

Code Examples

PostgreSQL: Weighted multi-column search with ranking

-- Input:  Table with title and body text columns
-- Output: Ranked search results with relevance score and snippet

ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'D')
  ) STORED;

CREATE INDEX idx_fts ON articles USING GIN (search_vector);

SELECT id, title,
  ts_rank(search_vector, q) AS rank,
  ts_headline('english', body, q,
    'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
  ) AS snippet
FROM articles,
  websearch_to_tsquery('english', 'database full text search') AS q
WHERE search_vector @@ q
ORDER BY rank DESC
LIMIT 20;

MySQL: Boolean mode with relevance sorting

-- Input:  Table with FULLTEXT index on title and body
-- Output: Filtered results with relevance score

ALTER TABLE articles ADD FULLTEXT INDEX ft_idx (title, body);

SELECT id, title,
  MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, body) AGAINST('+"full text" +search -elasticsearch' IN BOOLEAN MODE)
HAVING score > 0
ORDER BY score DESC
LIMIT 20;

PostgreSQL: Combined tsvector + pg_trgm fallback

-- Input:  User search query (may contain typos)
-- Output: Best results from exact FTS, falling back to fuzzy trigram

WITH fts_results AS (
  SELECT id, title, ts_rank(search_vector, q) AS rank, 'fts' AS source
  FROM articles, websearch_to_tsquery('english', $1) AS q
  WHERE search_vector @@ q
  ORDER BY rank DESC LIMIT 20
),
trgm_results AS (
  SELECT id, title, similarity(title, $1) AS rank, 'trgm' AS source
  FROM articles
  WHERE title % $1
    AND id NOT IN (SELECT id FROM fts_results)
  ORDER BY rank DESC LIMIT 10
)
SELECT * FROM fts_results
UNION ALL
SELECT * FROM trgm_results
ORDER BY rank DESC;

Python (psycopg2): Parameterized full-text search

# Input:  User search string
# Output: Ranked search results as list of dicts

import psycopg2

def search_articles(conn, query: str, limit: int = 20) -> list[dict]:
    """Full-text search with parameterized query (SQL injection safe)."""
    sql = """
        SELECT id, title,
          ts_rank(search_vector, websearch_to_tsquery('english', %s)) AS rank,
          ts_headline('english', body, websearch_to_tsquery('english', %s),
            'StartSel=<b>, StopSel=</b>, MaxWords=35') AS snippet
        FROM articles
        WHERE search_vector @@ websearch_to_tsquery('english', %s)
        ORDER BY rank DESC
        LIMIT %s
    """
    with conn.cursor() as cur:
        cur.execute(sql, (query, query, query, limit))
        cols = [desc[0] for desc in cur.description]
        return [dict(zip(cols, row)) for row in cur.fetchall()]

Anti-Patterns

Wrong: Using LIKE '%term%' for text search

-- BAD -- full table scan on every query, no ranking, no stemming
SELECT * FROM articles
WHERE body LIKE '%database%' OR body LIKE '%search%';

Correct: Use tsvector with GIN index

-- GOOD -- uses GIN index, supports stemming and ranking
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & search');

Wrong: Computing tsvector inline on every query

-- BAD -- recomputes tsvector for every row on every query
SELECT *, ts_rank(to_tsvector('english', body), to_tsquery('english', 'search')) AS rank
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');

Correct: Use a stored tsvector column with index

-- GOOD -- reads precomputed tsvector, GIN index filters rows
SELECT *, ts_rank(search_vector, to_tsquery('english', 'search')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');

Wrong: MySQL NATURAL LANGUAGE MODE for high-frequency terms

-- BAD -- returns zero results if common terms appear in >50% of rows
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('the best database' IN NATURAL LANGUAGE MODE);

Correct: Use BOOLEAN MODE to bypass the 50% threshold

-- GOOD -- boolean mode has no frequency threshold
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+best +database' IN BOOLEAN MODE);

Wrong: Using pg_trgm without setting similarity threshold

-- BAD -- default threshold 0.3 returns too many false positives for short strings
SELECT * FROM products WHERE name % 'cat';
-- Returns: "catalog", "category", "caterpillar", "concatenate" ...

Correct: Tune the similarity threshold before querying

-- GOOD -- raise threshold for short queries to reduce noise
SET pg_trgm.similarity_threshold = 0.5;
SELECT *, similarity(name, 'cat') AS sim FROM products
WHERE name % 'cat' ORDER BY sim DESC;

Wrong: Using plainto_tsquery for user input with operators

-- BAD -- user types "postgresql -mysql" but plainto_tsquery treats '-' as text
SELECT * FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql -mysql');
-- Searches for 'postgresql' AND 'mysql', ignoring the minus sign

Correct: Use websearch_to_tsquery for Google-style input

-- GOOD -- websearch_to_tsquery handles -, "quotes", and OR naturally
SELECT * FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'postgresql -mysql');
-- Correctly excludes rows containing 'mysql'

Common Pitfalls

Diagnostic Commands

-- Check if GIN index is being used (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'search');
-- Look for: "Bitmap Index Scan on idx_articles_search"

-- Inspect tsvector content for a row (PostgreSQL)
SELECT to_tsvector('english', 'The quick brown foxes jumped') AS vector;
-- Output: 'brown':3 'fox':4 'jump':5 'quick':2

-- Check MySQL FULLTEXT index status
SHOW INDEX FROM articles WHERE Index_type = 'FULLTEXT';

-- Check pg_trgm similarity between two strings
SELECT similarity('postgresql', 'postgrsql');
-- Output: ~0.53

-- Check MySQL InnoDB FULLTEXT settings
SHOW VARIABLES LIKE 'innodb_ft%';

-- View PostgreSQL GIN index size
SELECT pg_size_pretty(pg_relation_size('idx_articles_search')) AS index_size;

Version History & Compatibility

VersionStatusKey FeaturesNotes
PostgreSQL 17 (2024)CurrentGeneral improvementsFTS unchanged
PostgreSQL 12 (2019)SupportedGenerated columns for tsvectorMajor FTS ergonomic improvement
PostgreSQL 11 (2018)EOLwebsearch_to_tsquery() addedGoogle-style query parsing
PostgreSQL 9.6 (2016)EOLword_similarity() in pg_trgmImproved substring matching
MySQL 8.4 (2024)LTSInnoDB FULLTEXT stableRecommended version
MySQL 5.7 (2015)EOLInnoDB FULLTEXT supportPreviously MyISAM only

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Dataset < 10M rows and full-text is not the primary product featureSearch is the core product (e.g., document search engine)Elasticsearch, Typesense, or Meilisearch
Already using PostgreSQL/MySQL and want to avoid operational complexityNeed sub-10ms p99 latency on >100M documentsDedicated search infrastructure
Need stemming, ranking, and boolean operators in PostgreSQLNeed cross-language stemming with automatic language detectionElasticsearch with language analyzers
Simple search page or admin filter on an existing applicationNeed faceted navigation, aggregations, or typeahead at scaleElasticsearch or Algolia
Need typo-tolerant search on short fields (names, titles)Need fuzzy search on large text bodiespg_trgm on titles + tsvector on body (hybrid)
MySQL app needing basic keyword search with boolean operatorsNeed phrase proximity search or field-level boosting in MySQLPostgreSQL tsvector or Elasticsearch

Important Caveats

Related Units