SQL Temporal Queries: Gaps and Islands Patterns

Type: Software Reference Confidence: 0.93 Sources: 7 Verified: 2026-02-23 Freshness: 2026-02-23

TL;DR

Constraints

Quick Reference

ScenarioPatternWindow FunctionsTimeSpaceTrade-off
Integer gaps (missing IDs)LEAD differenceLEAD()O(n)O(1)Simplest; integers only
Integer islands (consecutive IDs)ROW_NUMBER differenceROW_NUMBER()O(n)O(1)Fast; fails with duplicates
Integer islands with duplicatesDENSE_RANK differenceDENSE_RANK()O(n)O(1)Handles duplicates correctly
Date gaps (missing days)LAG + date arithmeticLAG()O(n)O(1)Requires known interval
Date islands (consecutive days)LAG + running SUMLAG(), SUM()O(n)O(n)Most flexible for dates
Timestamp islands (activity sessions)LAG + threshold + SUMLAG(), SUM()O(n)O(n)Configurable gap threshold
Overlapping date range mergingRunning MAX end dateMAX(), LAG()O(n)O(n)Handles overlaps correctly
Partitioned islands (per-entity)Any pattern + PARTITION BYvariesO(n)O(n)Add PARTITION BY to all window fns
Islands with row numberingNo-sort optimizedROW_NUMBER(), LAG(), MAX()O(n)O(n)Best serial performance
Gaps in date seriesGenerate series + LEFT JOINgenerate_series()O(n)O(n)PostgreSQL-specific; clearest
Multi-column islandsComposite LAG comparisonLAG() per columnO(n)O(n)Compare all grouping columns

Decision Tree

START
|-- Is the sequence numeric integers (IDs)?
|   |-- YES --> Are there duplicate values?
|   |   |-- YES --> Use DENSE_RANK difference technique
|   |   +-- NO --> Use ROW_NUMBER difference technique
|   +-- NO (dates/timestamps) |
|-- Are date ranges overlapping?
|   |-- YES --> Use running MAX end-date technique (overlapping merge)
|   +-- NO |
|-- Do you need gaps (missing) or islands (present)?
|   |-- GAPS --> Is the database PostgreSQL?
|   |   |-- YES --> Use generate_series + LEFT JOIN
|   |   +-- NO --> Use LEAD/LAG gap detection
|   +-- ISLANDS |
|-- Is the interval fixed (1 day, 1 hour)?
|   |-- YES --> Use LAG + running SUM with fixed threshold
|   +-- NO --> Use LAG + running SUM with variable threshold
+-- DEFAULT --> LAG + running SUM (most general technique)

Step-by-Step Guide

1. Identify your sequence type and grouping

Determine whether your data is integer-based (IDs, sequence numbers) or temporal (dates, timestamps), and identify the PARTITION BY columns. [src1]

-- Inspect your data shape
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table'
  AND column_name IN ('id', 'event_date', 'user_id', 'status');

Verify: Confirm the sequence column type is integer, date, or timestamp.

2. Add the grouping key using window functions

For integer sequences, subtract ROW_NUMBER from the value. For date sequences, compare with LAG and accumulate island IDs with running SUM. [src2]

-- Integer islands: the ROW_NUMBER difference technique
SELECT
  val,
  val - ROW_NUMBER() OVER (ORDER BY val) AS grp
FROM your_table;
-- Rows in the same island will have the same grp value

Verify: SELECT DISTINCT grp FROM cte should return one value per island.

3. Aggregate to get island boundaries

Group by the computed key to get the start and end of each island. [src1]

WITH islands AS (
  SELECT
    val,
    val - ROW_NUMBER() OVER (ORDER BY val) AS grp
  FROM your_table
)
SELECT
  MIN(val) AS island_start,
  MAX(val) AS island_end,
  COUNT(*) AS island_length
FROM islands
GROUP BY grp
ORDER BY island_start;

Verify: SUM(island_length) should equal COUNT(*) from the source table.

4. Validate edge cases

Check for NULLs, duplicates, and partition boundaries. [src5]

-- Check for NULLs in sequence column
SELECT COUNT(*) AS null_count
FROM your_table
WHERE val IS NULL;

-- Check for duplicates
SELECT val, COUNT(*) AS cnt
FROM your_table
GROUP BY val
HAVING COUNT(*) > 1;

Verify: If null_count > 0, add WHERE val IS NOT NULL or switch to DENSE_RANK. If duplicates exist, use DENSE_RANK instead of ROW_NUMBER.

Code Examples

PostgreSQL: Detect gaps in a date sequence

-- Input:  Table 'user_logins' with columns (user_id, login_date)
-- Output: Missing date ranges per user
WITH date_range AS (
  SELECT user_id,
         login_date,
         LEAD(login_date) OVER (
           PARTITION BY user_id ORDER BY login_date
         ) AS next_login
  FROM user_logins
)
SELECT user_id,
       login_date + INTERVAL '1 day' AS gap_start,
       next_login - INTERVAL '1 day' AS gap_end,
       (next_login - login_date - 1) AS days_missing
FROM date_range
WHERE next_login - login_date > 1
ORDER BY user_id, gap_start;

PostgreSQL: Detect islands of consecutive dates

Full script: postgresql-detect-islands-of-consecutive-dates.sql

-- Input:  Table 'daily_activity' with columns (user_id, activity_date)
-- Output: Consecutive date ranges (islands) per user
WITH flagged AS (
  SELECT user_id, activity_date,
    CASE WHEN activity_date - LAG(activity_date) OVER (
      PARTITION BY user_id ORDER BY activity_date
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM daily_activity
),
grouped AS (
  SELECT user_id, activity_date,
    SUM(new_island) OVER (
      PARTITION BY user_id ORDER BY activity_date
      ROWS UNBOUNDED PRECEDING
    ) AS island_id
  FROM flagged
)
SELECT user_id,
       MIN(activity_date) AS island_start,
       MAX(activity_date) AS island_end,
       COUNT(*) AS consecutive_days
FROM grouped
GROUP BY user_id, island_id
ORDER BY user_id, island_start;

PostgreSQL: Merge overlapping date ranges

Full script: postgresql-merge-overlapping-date-ranges.sql

-- Input:  Table 'subscriptions' with columns (user_id, start_date, end_date)
-- Output: Merged non-overlapping date ranges per user
WITH ordered AS (
  SELECT user_id, start_date, end_date,
    MAX(end_date) OVER (
      PARTITION BY user_id ORDER BY start_date
      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) AS max_prev_end
  FROM subscriptions
),
flagged AS (
  SELECT *,
    CASE WHEN start_date <= max_prev_end THEN 0 ELSE 1 END AS new_group
  FROM ordered
),
grouped AS (
  SELECT *,
    SUM(new_group) OVER (
      PARTITION BY user_id ORDER BY start_date
      ROWS UNBOUNDED PRECEDING
    ) AS grp
  FROM flagged
)
SELECT user_id,
       MIN(start_date) AS merged_start,
       MAX(end_date) AS merged_end
FROM grouped
GROUP BY user_id, grp
ORDER BY user_id, merged_start;

SQL Server: Integer gaps with LEAD

-- Input:  Table 'invoice_numbers' with column (invoice_no INT)
-- Output: Missing invoice number ranges
WITH gaps AS (
  SELECT invoice_no,
         LEAD(invoice_no) OVER (ORDER BY invoice_no) AS next_no
  FROM invoice_numbers
)
SELECT invoice_no + 1 AS gap_start,
       next_no - 1    AS gap_end,
       next_no - invoice_no - 1 AS missing_count
FROM gaps
WHERE next_no - invoice_no > 1
ORDER BY gap_start;

Anti-Patterns

Wrong: Using ROW_NUMBER with duplicate values

-- BAD -- ROW_NUMBER assigns unique numbers, splitting identical values into separate islands
WITH islands AS (
  SELECT status, event_date,
    event_date - INTERVAL '1 day' * ROW_NUMBER() OVER (
      PARTITION BY status ORDER BY event_date
    ) AS grp
  FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;

Correct: Use DENSE_RANK for data with duplicates

-- GOOD -- DENSE_RANK handles duplicates by assigning the same rank to ties
WITH islands AS (
  SELECT status, event_date,
    event_date - INTERVAL '1 day' * DENSE_RANK() OVER (
      PARTITION BY status ORDER BY event_date
    ) AS grp
  FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;

Wrong: Omitting PARTITION BY when data has multiple entities

-- BAD -- Without PARTITION BY, islands from different users get merged
WITH flagged AS (
  SELECT user_id, login_date,
    CASE WHEN login_date - LAG(login_date) OVER (
      ORDER BY login_date  -- missing PARTITION BY user_id!
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM user_logins
)

Correct: Always PARTITION BY the entity column

-- GOOD -- Each user's islands are computed independently
WITH flagged AS (
  SELECT user_id, login_date,
    CASE WHEN login_date - LAG(login_date) OVER (
      PARTITION BY user_id ORDER BY login_date
    ) = 1 THEN 0 ELSE 1 END AS new_island
  FROM user_logins
)

Wrong: Using self-join for gap detection

-- BAD -- O(n^2) self-join; catastrophic on large tables
SELECT a.val + 1 AS gap_start,
       MIN(b.val) - 1 AS gap_end
FROM sequence_table a
INNER JOIN sequence_table b ON b.val > a.val
WHERE NOT EXISTS (
  SELECT 1 FROM sequence_table c
  WHERE c.val = a.val + 1
)
GROUP BY a.val;

Correct: Use LEAD window function for O(n) gap detection

-- GOOD -- Single pass with LEAD, O(n) with proper index
WITH gaps AS (
  SELECT val,
    LEAD(val) OVER (ORDER BY val) AS next_val
  FROM sequence_table
)
SELECT val + 1 AS gap_start,
       next_val - 1 AS gap_end
FROM gaps
WHERE next_val - val > 1;

Common Pitfalls

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Finding missing values in a known sequence (invoice numbers, dates)Bucketing data into fixed time intervals (hourly, daily)DATE_TRUNC / time_bucket aggregation
Identifying consecutive activity streaks (login streaks, uptime)Detecting simple duplicates in a columnGROUP BY + HAVING COUNT > 1
Merging overlapping date ranges (subscriptions, schedules)Joining two tables on date ranges (interval overlap)Range join with daterange / OVERLAPS operator
Computing session duration from event timestampsComputing running totals or moving averagesStandard window function aggregates
Building slowly changing dimensions from event logsData has no natural ordering or sequenceGraph traversal / recursive CTE

Important Caveats

Related Units