SQL gaps and islands detection

- Bottom line: Gaps and islands problems are solved by computing a grouping key from the difference between a data value and a window function rank, then aggregating per group.

find missing date ranges in SQL

- Bottom line: Gaps and islands problems are solved by computing a grouping key from the difference between a data value and a window function rank, then aggregating per group.

identify consecutive sequences in SQL

- Bottom line: Gaps and islands problems are solved by computing a grouping key from the difference between a data value and a window function rank, then aggregating per group.

SQL contiguous range detection

- Bottom line: Gaps and islands problems are solved by computing a grouping key from the difference between a data value and a window function rank, then aggregating per group.

SQL Temporal Queries: Gaps and Islands Patterns

How do I write SQL temporal queries for gaps and islands problems?

TL;DR

Bottom line: Gaps and islands problems are solved by computing a grouping key from the difference between a data value and a window function rank, then aggregating per group.
Key tool/command: ROW_NUMBER() OVER (ORDER BY col) - col for integer islands; LAG() + running SUM() for date/timestamp islands.
Watch out for: Forgetting to PARTITION BY your grouping column; this merges islands across unrelated groups.
Works with: PostgreSQL 9.4+, MySQL 8.0+, SQL Server 2012+, Oracle 12c+, SQLite 3.25+ (any database with window function support).

Constraints

Window functions require SQL:2003+ compliant databases; MySQL < 8.0 and SQLite < 3.25 do not support them
ORDER BY in window functions must be deterministic; add a tiebreaker column (e.g., primary key) if the sequence column has duplicates
NULL values in LAG/LEAD comparisons evaluate to NULL, not FALSE; use COALESCE() or IS DISTINCT FROM to handle NULLs
PARTITION BY must match your logical entity grouping (e.g., per-user, per-device); omitting it merges all rows into one partition
For overlapping date ranges, standard gap detection fails; use the running-maximum technique instead

Quick Reference

Scenario	Pattern	Window Functions	Time	Space	Trade-off
Integer gaps (missing IDs)	LEAD difference	`LEAD()`	O(n)	O(1)	Simplest; integers only
Integer islands (consecutive IDs)	ROW_NUMBER difference	`ROW_NUMBER()`	O(n)	O(1)	Fast; fails with duplicates
Integer islands with duplicates	DENSE_RANK difference	`DENSE_RANK()`	O(n)	O(1)	Handles duplicates correctly
Date gaps (missing days)	LAG + date arithmetic	`LAG()`	O(n)	O(1)	Requires known interval
Date islands (consecutive days)	LAG + running SUM	`LAG()`, `SUM()`	O(n)	O(n)	Most flexible for dates
Timestamp islands (activity sessions)	LAG + threshold + SUM	`LAG()`, `SUM()`	O(n)	O(n)	Configurable gap threshold
Overlapping date range merging	Running MAX end date	`MAX()`, `LAG()`	O(n)	O(n)	Handles overlaps correctly
Partitioned islands (per-entity)	Any pattern + PARTITION BY	varies	O(n)	O(n)	Add PARTITION BY to all window fns
Islands with row numbering	No-sort optimized	`ROW_NUMBER()`, `LAG()`, `MAX()`	O(n)	O(n)	Best serial performance
Gaps in date series	Generate series + LEFT JOIN	`generate_series()`	O(n)	O(n)	PostgreSQL-specific; clearest
Multi-column islands	Composite LAG comparison	`LAG()` per column	O(n)	O(n)	Compare all grouping columns

Decision Tree

START
|-- Is the sequence numeric integers (IDs)?
|   |-- YES --> Are there duplicate values?
|   |   |-- YES --> Use DENSE_RANK difference technique
|   |   +-- NO --> Use ROW_NUMBER difference technique
|   +-- NO (dates/timestamps) |
|-- Are date ranges overlapping?
|   |-- YES --> Use running MAX end-date technique (overlapping merge)
|   +-- NO |
|-- Do you need gaps (missing) or islands (present)?
|   |-- GAPS --> Is the database PostgreSQL?
|   |   |-- YES --> Use generate_series + LEFT JOIN
|   |   +-- NO --> Use LEAD/LAG gap detection
|   +-- ISLANDS |
|-- Is the interval fixed (1 day, 1 hour)?
|   |-- YES --> Use LAG + running SUM with fixed threshold
|   +-- NO --> Use LAG + running SUM with variable threshold
+-- DEFAULT --> LAG + running SUM (most general technique)

Step-by-Step Guide

1. Identify your sequence type and grouping

Determine whether your data is integer-based (IDs, sequence numbers) or temporal (dates, timestamps), and identify the PARTITION BY columns. [src1]

-- Inspect your data shape
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table'
  AND column_name IN ('id', 'event_date', 'user_id', 'status');

Verify: Confirm the sequence column type is integer, date, or timestamp.

2. Add the grouping key using window functions

For integer sequences, subtract ROW_NUMBER from the value. For date sequences, compare with LAG and accumulate island IDs with running SUM. [src2]

-- Integer islands: the ROW_NUMBER difference technique
SELECT
  val,
  val - ROW_NUMBER() OVER (ORDER BY val) AS grp
FROM your_table;
-- Rows in the same island will have the same grp value

Verify: SELECT DISTINCT grp FROM cte should return one value per island.

3. Aggregate to get island boundaries

Group by the computed key to get the start and end of each island. [src1]

WITH islands AS (
  SELECT
    val,
    val - ROW_NUMBER() OVER (ORDER BY val) AS grp
  FROM your_table
)
SELECT
  MIN(val) AS island_start,
  MAX(val) AS island_end,
  COUNT(*) AS island_length
FROM islands
GROUP BY grp
ORDER BY island_start;

Verify: SUM(island_length) should equal COUNT(*) from the source table.

4. Validate edge cases

Check for NULLs, duplicates, and partition boundaries. [src5]

-- Check for NULLs in sequence column
SELECT COUNT(*) AS null_count
FROM your_table
WHERE val IS NULL;

-- Check for duplicates
SELECT val, COUNT(*) AS cnt
FROM your_table
GROUP BY val
HAVING COUNT(*) > 1;

Verify: If null_count > 0, add WHERE val IS NOT NULL or switch to DENSE_RANK. If duplicates exist, use DENSE_RANK instead of ROW_NUMBER.

Code Examples

PostgreSQL: Detect gaps in a date sequence

-- Input:  Table 'user_logins' with columns (user_id, login_date)
-- Output: Missing date ranges per user
WITH date_range AS (
  SELECT user_id,
         login_date,
         LEAD(login_date) OVER (
           PARTITION BY user_id ORDER BY login_date
         ) AS next_login
  FROM user_logins
)
SELECT user_id,
       login_date + INTERVAL '1 day' AS gap_start,
       next_login - INTERVAL '1 day' AS gap_end,
       (next_login - login_date - 1) AS days_missing
FROM date_range
WHERE next_login - login_date > 1
ORDER BY user_id, gap_start;

PostgreSQL: Detect islands of consecutive dates

Full script: postgresql-detect-islands-of-consecutive-dates.sql

-- Input:  Table 'daily_activity' with columns (user_id, activity_date)
-- Output: Consecutive date ranges (islands) per user
WITH flagged AS (
  SELECT user_id, activity_date,
    CASE WHEN activity_date - LAG(activity_date) OVER (
      PARTITION BY user_id ORDER BY activity_date
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM daily_activity
),
grouped AS (
  SELECT user_id, activity_date,
    SUM(new_island) OVER (
      PARTITION BY user_id ORDER BY activity_date
      ROWS UNBOUNDED PRECEDING
    ) AS island_id
  FROM flagged
)
SELECT user_id,
       MIN(activity_date) AS island_start,
       MAX(activity_date) AS island_end,
       COUNT(*) AS consecutive_days
FROM grouped
GROUP BY user_id, island_id
ORDER BY user_id, island_start;

PostgreSQL: Merge overlapping date ranges

Full script: postgresql-merge-overlapping-date-ranges.sql

-- Input:  Table 'subscriptions' with columns (user_id, start_date, end_date)
-- Output: Merged non-overlapping date ranges per user
WITH ordered AS (
  SELECT user_id, start_date, end_date,
    MAX(end_date) OVER (
      PARTITION BY user_id ORDER BY start_date
      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) AS max_prev_end
  FROM subscriptions
),
flagged AS (
  SELECT *,
    CASE WHEN start_date <= max_prev_end THEN 0 ELSE 1 END AS new_group
  FROM ordered
),
grouped AS (
  SELECT *,
    SUM(new_group) OVER (
      PARTITION BY user_id ORDER BY start_date
      ROWS UNBOUNDED PRECEDING
    ) AS grp
  FROM flagged
)
SELECT user_id,
       MIN(start_date) AS merged_start,
       MAX(end_date) AS merged_end
FROM grouped
GROUP BY user_id, grp
ORDER BY user_id, merged_start;

SQL Server: Integer gaps with LEAD

-- Input:  Table 'invoice_numbers' with column (invoice_no INT)
-- Output: Missing invoice number ranges
WITH gaps AS (
  SELECT invoice_no,
         LEAD(invoice_no) OVER (ORDER BY invoice_no) AS next_no
  FROM invoice_numbers
)
SELECT invoice_no + 1 AS gap_start,
       next_no - 1    AS gap_end,
       next_no - invoice_no - 1 AS missing_count
FROM gaps
WHERE next_no - invoice_no > 1
ORDER BY gap_start;

Anti-Patterns

Wrong: Using ROW_NUMBER with duplicate values

-- BAD -- ROW_NUMBER assigns unique numbers, splitting identical values into separate islands
WITH islands AS (
  SELECT status, event_date,
    event_date - INTERVAL '1 day' * ROW_NUMBER() OVER (
      PARTITION BY status ORDER BY event_date
    ) AS grp
  FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;

Correct: Use DENSE_RANK for data with duplicates

-- GOOD -- DENSE_RANK handles duplicates by assigning the same rank to ties
WITH islands AS (
  SELECT status, event_date,
    event_date - INTERVAL '1 day' * DENSE_RANK() OVER (
      PARTITION BY status ORDER BY event_date
    ) AS grp
  FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;

Wrong: Omitting PARTITION BY when data has multiple entities

-- BAD -- Without PARTITION BY, islands from different users get merged
WITH flagged AS (
  SELECT user_id, login_date,
    CASE WHEN login_date - LAG(login_date) OVER (
      ORDER BY login_date  -- missing PARTITION BY user_id!
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM user_logins
)

Correct: Always PARTITION BY the entity column

-- GOOD -- Each user's islands are computed independently
WITH flagged AS (
  SELECT user_id, login_date,
    CASE WHEN login_date - LAG(login_date) OVER (
      PARTITION BY user_id ORDER BY login_date
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM user_logins
)

Wrong: Using self-join for gap detection

-- BAD -- O(n^2) self-join; catastrophic on large tables
SELECT a.val + 1 AS gap_start,
       MIN(b.val) - 1 AS gap_end
FROM sequence_table a
INNER JOIN sequence_table b ON b.val > a.val
WHERE NOT EXISTS (
  SELECT 1 FROM sequence_table c
  WHERE c.val = a.val + 1
)
GROUP BY a.val;

Correct: Use LEAD window function for O(n) gap detection

-- GOOD -- Single pass with LEAD, O(n) with proper index
WITH gaps AS (
  SELECT val,
    LEAD(val) OVER (ORDER BY val) AS next_val
  FROM sequence_table
)
SELECT val + 1 AS gap_start,
       next_val - 1 AS gap_end
FROM gaps
WHERE next_val - val > 1;

Common Pitfalls

Non-deterministic ORDER BY: If your ORDER BY column has ties, ROW_NUMBER assigns arbitrary numbers, producing inconsistent island boundaries across runs. Fix: add a tiebreaker column, e.g., ORDER BY event_date, id. [src1]
NULL propagation in LAG/LEAD: LAG(val) returns NULL for the first row, and NULL - date is NULL, not a flag. Fix: COALESCE(LAG(val) OVER (...), val) or handle the first-row case explicitly. [src4]
Off-by-one in date arithmetic: date2 - date1 = 1 means consecutive days, but timestamp2 - timestamp1 = INTERVAL '1 day' may miss times. Fix: cast to DATE or use DATE_TRUNC('day', ts) before comparison. [src5]
Forgetting ROWS UNBOUNDED PRECEDING: The default window frame for SUM() OVER (ORDER BY ...) is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which groups equal values. Fix: always specify ROWS UNBOUNDED PRECEDING for running sums. [src2]
Overlapping ranges treated as gaps: Standard gap detection assumes non-overlapping intervals. If ranges overlap, use the running-MAX technique to merge before detecting. Fix: pre-merge overlapping ranges first. [src7]
Performance on unsorted data: Window functions require sorted input. Without an index on the ORDER BY column, the planner adds an expensive Sort operator. Fix: create an index on (partition_col, order_col). [src3]

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Finding missing values in a known sequence (invoice numbers, dates)	Bucketing data into fixed time intervals (hourly, daily)	DATE_TRUNC / time_bucket aggregation
Identifying consecutive activity streaks (login streaks, uptime)	Detecting simple duplicates in a column	GROUP BY + HAVING COUNT > 1
Merging overlapping date ranges (subscriptions, schedules)	Joining two tables on date ranges (interval overlap)	Range join with daterange / OVERLAPS operator
Computing session duration from event timestamps	Computing running totals or moving averages	Standard window function aggregates
Building slowly changing dimensions from event logs	Data has no natural ordering or sequence	Graph traversal / recursive CTE

Important Caveats

The ROW_NUMBER difference technique only works for sequences with a known, constant step size (e.g., +1 for integers, +1 day for dates); variable-step sequences require the LAG/SUM approach
MySQL 8.0 supports window functions but lacks generate_series(); use a recursive CTE or a numbers table for gap detection with date series
Performance varies significantly by engine: SQL Server's batch mode processing can achieve 5x speedup on islands queries with serial execution plans when window functions share the same PARTITION BY and ORDER BY
SQLite window function support (3.25+) is functional but lacks some advanced framing options; test complex ROWS/RANGE frames carefully
For very large tables (>10M rows), ensure a covering index exists on (partition_col, order_col, value_col) to avoid table scans