ROW_NUMBER() OVER (ORDER BY col) - col for integer islands; LAG() + running SUM() for date/timestamp islands.COALESCE() or IS DISTINCT FROM to handle NULLs| Scenario | Pattern | Window Functions | Time | Space | Trade-off |
|---|---|---|---|---|---|
| Integer gaps (missing IDs) | LEAD difference | LEAD() | O(n) | O(1) | Simplest; integers only |
| Integer islands (consecutive IDs) | ROW_NUMBER difference | ROW_NUMBER() | O(n) | O(1) | Fast; fails with duplicates |
| Integer islands with duplicates | DENSE_RANK difference | DENSE_RANK() | O(n) | O(1) | Handles duplicates correctly |
| Date gaps (missing days) | LAG + date arithmetic | LAG() | O(n) | O(1) | Requires known interval |
| Date islands (consecutive days) | LAG + running SUM | LAG(), SUM() | O(n) | O(n) | Most flexible for dates |
| Timestamp islands (activity sessions) | LAG + threshold + SUM | LAG(), SUM() | O(n) | O(n) | Configurable gap threshold |
| Overlapping date range merging | Running MAX end date | MAX(), LAG() | O(n) | O(n) | Handles overlaps correctly |
| Partitioned islands (per-entity) | Any pattern + PARTITION BY | varies | O(n) | O(n) | Add PARTITION BY to all window fns |
| Islands with row numbering | No-sort optimized | ROW_NUMBER(), LAG(), MAX() | O(n) | O(n) | Best serial performance |
| Gaps in date series | Generate series + LEFT JOIN | generate_series() | O(n) | O(n) | PostgreSQL-specific; clearest |
| Multi-column islands | Composite LAG comparison | LAG() per column | O(n) | O(n) | Compare all grouping columns |
START
|-- Is the sequence numeric integers (IDs)?
| |-- YES --> Are there duplicate values?
| | |-- YES --> Use DENSE_RANK difference technique
| | +-- NO --> Use ROW_NUMBER difference technique
| +-- NO (dates/timestamps) |
|-- Are date ranges overlapping?
| |-- YES --> Use running MAX end-date technique (overlapping merge)
| +-- NO |
|-- Do you need gaps (missing) or islands (present)?
| |-- GAPS --> Is the database PostgreSQL?
| | |-- YES --> Use generate_series + LEFT JOIN
| | +-- NO --> Use LEAD/LAG gap detection
| +-- ISLANDS |
|-- Is the interval fixed (1 day, 1 hour)?
| |-- YES --> Use LAG + running SUM with fixed threshold
| +-- NO --> Use LAG + running SUM with variable threshold
+-- DEFAULT --> LAG + running SUM (most general technique)
Determine whether your data is integer-based (IDs, sequence numbers) or temporal (dates, timestamps), and identify the PARTITION BY columns. [src1]
-- Inspect your data shape
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'your_table'
AND column_name IN ('id', 'event_date', 'user_id', 'status');
Verify: Confirm the sequence column type is integer, date, or timestamp.
For integer sequences, subtract ROW_NUMBER from the value. For date sequences, compare with LAG and accumulate island IDs with running SUM. [src2]
-- Integer islands: the ROW_NUMBER difference technique
SELECT
val,
val - ROW_NUMBER() OVER (ORDER BY val) AS grp
FROM your_table;
-- Rows in the same island will have the same grp value
Verify: SELECT DISTINCT grp FROM cte should return one value per island.
Group by the computed key to get the start and end of each island. [src1]
WITH islands AS (
SELECT
val,
val - ROW_NUMBER() OVER (ORDER BY val) AS grp
FROM your_table
)
SELECT
MIN(val) AS island_start,
MAX(val) AS island_end,
COUNT(*) AS island_length
FROM islands
GROUP BY grp
ORDER BY island_start;
Verify: SUM(island_length) should equal COUNT(*) from the source table.
Check for NULLs, duplicates, and partition boundaries. [src5]
-- Check for NULLs in sequence column
SELECT COUNT(*) AS null_count
FROM your_table
WHERE val IS NULL;
-- Check for duplicates
SELECT val, COUNT(*) AS cnt
FROM your_table
GROUP BY val
HAVING COUNT(*) > 1;
Verify: If null_count > 0, add WHERE val IS NOT NULL or switch to DENSE_RANK. If duplicates exist, use DENSE_RANK instead of ROW_NUMBER.
-- Input: Table 'user_logins' with columns (user_id, login_date)
-- Output: Missing date ranges per user
WITH date_range AS (
SELECT user_id,
login_date,
LEAD(login_date) OVER (
PARTITION BY user_id ORDER BY login_date
) AS next_login
FROM user_logins
)
SELECT user_id,
login_date + INTERVAL '1 day' AS gap_start,
next_login - INTERVAL '1 day' AS gap_end,
(next_login - login_date - 1) AS days_missing
FROM date_range
WHERE next_login - login_date > 1
ORDER BY user_id, gap_start;
Full script: postgresql-detect-islands-of-consecutive-dates.sql
-- Input: Table 'daily_activity' with columns (user_id, activity_date)
-- Output: Consecutive date ranges (islands) per user
WITH flagged AS (
SELECT user_id, activity_date,
CASE WHEN activity_date - LAG(activity_date) OVER (
PARTITION BY user_id ORDER BY activity_date
) = 1 THEN 0 ELSE 1 END AS new_island
FROM daily_activity
),
grouped AS (
SELECT user_id, activity_date,
SUM(new_island) OVER (
PARTITION BY user_id ORDER BY activity_date
ROWS UNBOUNDED PRECEDING
) AS island_id
FROM flagged
)
SELECT user_id,
MIN(activity_date) AS island_start,
MAX(activity_date) AS island_end,
COUNT(*) AS consecutive_days
FROM grouped
GROUP BY user_id, island_id
ORDER BY user_id, island_start;
Full script: postgresql-merge-overlapping-date-ranges.sql
-- Input: Table 'subscriptions' with columns (user_id, start_date, end_date)
-- Output: Merged non-overlapping date ranges per user
WITH ordered AS (
SELECT user_id, start_date, end_date,
MAX(end_date) OVER (
PARTITION BY user_id ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS max_prev_end
FROM subscriptions
),
flagged AS (
SELECT *,
CASE WHEN start_date <= max_prev_end THEN 0 ELSE 1 END AS new_group
FROM ordered
),
grouped AS (
SELECT *,
SUM(new_group) OVER (
PARTITION BY user_id ORDER BY start_date
ROWS UNBOUNDED PRECEDING
) AS grp
FROM flagged
)
SELECT user_id,
MIN(start_date) AS merged_start,
MAX(end_date) AS merged_end
FROM grouped
GROUP BY user_id, grp
ORDER BY user_id, merged_start;
-- Input: Table 'invoice_numbers' with column (invoice_no INT)
-- Output: Missing invoice number ranges
WITH gaps AS (
SELECT invoice_no,
LEAD(invoice_no) OVER (ORDER BY invoice_no) AS next_no
FROM invoice_numbers
)
SELECT invoice_no + 1 AS gap_start,
next_no - 1 AS gap_end,
next_no - invoice_no - 1 AS missing_count
FROM gaps
WHERE next_no - invoice_no > 1
ORDER BY gap_start;
-- BAD -- ROW_NUMBER assigns unique numbers, splitting identical values into separate islands
WITH islands AS (
SELECT status, event_date,
event_date - INTERVAL '1 day' * ROW_NUMBER() OVER (
PARTITION BY status ORDER BY event_date
) AS grp
FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;
-- GOOD -- DENSE_RANK handles duplicates by assigning the same rank to ties
WITH islands AS (
SELECT status, event_date,
event_date - INTERVAL '1 day' * DENSE_RANK() OVER (
PARTITION BY status ORDER BY event_date
) AS grp
FROM events
)
SELECT status, MIN(event_date), MAX(event_date)
FROM islands GROUP BY status, grp;
-- BAD -- Without PARTITION BY, islands from different users get merged
WITH flagged AS (
SELECT user_id, login_date,
CASE WHEN login_date - LAG(login_date) OVER (
ORDER BY login_date -- missing PARTITION BY user_id!
) = 1 THEN 0 ELSE 1 END AS new_island
FROM user_logins
)
-- GOOD -- Each user's islands are computed independently
WITH flagged AS (
SELECT user_id, login_date,
CASE WHEN login_date - LAG(login_date) OVER (
PARTITION BY user_id ORDER BY login_date
) = 1 THEN 0 ELSE 1 END AS new_island
FROM user_logins
)
-- BAD -- O(n^2) self-join; catastrophic on large tables
SELECT a.val + 1 AS gap_start,
MIN(b.val) - 1 AS gap_end
FROM sequence_table a
INNER JOIN sequence_table b ON b.val > a.val
WHERE NOT EXISTS (
SELECT 1 FROM sequence_table c
WHERE c.val = a.val + 1
)
GROUP BY a.val;
-- GOOD -- Single pass with LEAD, O(n) with proper index
WITH gaps AS (
SELECT val,
LEAD(val) OVER (ORDER BY val) AS next_val
FROM sequence_table
)
SELECT val + 1 AS gap_start,
next_val - 1 AS gap_end
FROM gaps
WHERE next_val - val > 1;
ORDER BY event_date, id. [src1]LAG(val) returns NULL for the first row, and NULL - date is NULL, not a flag. Fix: COALESCE(LAG(val) OVER (...), val) or handle the first-row case explicitly. [src4]date2 - date1 = 1 means consecutive days, but timestamp2 - timestamp1 = INTERVAL '1 day' may miss times. Fix: cast to DATE or use DATE_TRUNC('day', ts) before comparison. [src5]SUM() OVER (ORDER BY ...) is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which groups equal values. Fix: always specify ROWS UNBOUNDED PRECEDING for running sums. [src2](partition_col, order_col). [src3]| Use When | Don't Use When | Use Instead |
|---|---|---|
| Finding missing values in a known sequence (invoice numbers, dates) | Bucketing data into fixed time intervals (hourly, daily) | DATE_TRUNC / time_bucket aggregation |
| Identifying consecutive activity streaks (login streaks, uptime) | Detecting simple duplicates in a column | GROUP BY + HAVING COUNT > 1 |
| Merging overlapping date ranges (subscriptions, schedules) | Joining two tables on date ranges (interval overlap) | Range join with daterange / OVERLAPS operator |
| Computing session duration from event timestamps | Computing running totals or moving averages | Standard window function aggregates |
| Building slowly changing dimensions from event logs | Data has no natural ordering or sequence | Graph traversal / recursive CTE |
generate_series(); use a recursive CTE or a numbers table for gap detection with date series(partition_col, order_col, value_col) to avoid table scans