The transactional outbox pattern is an architecture pattern that applies to any system using a relational database for business data that needs to reliably publish events to external consumers. It is the canonical solution for the dual-write problem in distributed systems. This card covers the general pattern, ERP-specific adaptations for Salesforce/SAP/NetSuite/D365, relay mechanisms (polling and CDC), and a decision framework comparing outbox vs CDC vs dual-write vs event sourcing.
| System | Role | API Surface | Direction |
|---|---|---|---|
| ERP Database (source) | Business data persistence + outbox table | SQL (same transaction) | Source |
| Relay Process | Reads outbox, publishes to broker | Polling SQL or CDC log tailing | Bridge |
| Message Broker (Kafka, RabbitMQ, SQS, Service Bus) | Event distribution to consumers | Broker-native protocol | Target |
| Debezium (optional) | CDC-based relay -- reads DB transaction log | Kafka Connect | Bridge |
The outbox pattern does not define a specific API surface -- it defines how a database transaction atomically produces both business data and an event record. The relay mechanism bridges the outbox table to the message broker. [src1, src3]
| Relay Mechanism | Latency | Infrastructure | DB Load | Ordering | Complexity |
|---|---|---|---|---|---|
| Polling Publisher | 1-5s (poll interval) | Minimal -- background worker | Moderate | Per-aggregate (sequence_id) | Low |
| Debezium CDC | Sub-second | Kafka + Kafka Connect | Minimal (reads WAL) | Per-partition (aggregate_id) | Medium |
| DynamoDB Streams | Sub-second | AWS-native | None -- built-in | Per-partition-key | Low (AWS-only) |
| Salesforce Platform Events | Near real-time | SF-native | None -- built-in | Per-replay-ID | Low (SF-only) |
| SAP Event Mesh | Near real-time | BTP | None -- built-in | Per-topic | Medium (SAP-only) |
| System | Limit Type | Value | Notes |
|---|---|---|---|
| Salesforce Platform Events | Standard-volume events | 100K/day (Enterprise), 500K/day (Unlimited) | High-volume requires add-on license |
| Salesforce CDC | Event delivery allocation | Shared with Platform Events daily limit | 1 event per field change, not per record |
| SAP Event Mesh | Messages/month | 1M (standard), custom enterprise tiers | Queue depth: 1,000 messages default |
| NetSuite SuiteScript | Governance units per script | 1,000 (client), 10,000 (server) | Each https.request() costs 10 units |
| D365 Business Events | Endpoint throughput | Azure Service Bus limits apply | No D365-specific rate limit |
| Debezium (Kafka) | Throughput | 100K+ events/sec | Bottleneck is source DB WAL throughput |
| Metric | Recommended Threshold | Impact of Exceeding |
|---|---|---|
| Outbox table size | < 100,000 unpublished rows | Query performance degrades; index bloat |
| Poll interval | 1-5 seconds | Too low: DB load. Too high: delivery latency |
| Batch size per poll | 100-500 rows | Too large: long transactions. Too small: relay overhead |
| Row retention | Delete after publish + 24h buffer | Unbounded growth causes vacuum/analyze issues |
| Component | Auth Method | Notes |
|---|---|---|
| Outbox table (same DB) | DB credentials (connection pool) | Same transaction context -- no separate auth |
| Debezium connector | DB replication credentials | Requires REPLICATION role in PostgreSQL |
| Kafka broker | SASL/SCRAM or mTLS | Connector config includes broker credentials |
| Salesforce Pub/Sub API | OAuth 2.0 JWT bearer | For external subscribers consuming Platform Events |
| SAP Event Mesh | OAuth 2.0 client credentials | BTP service binding provides credentials |
| Azure Service Bus | Managed Identity or SAS token | D365 Business Events connect via SAS policy |
START -- Need reliable event publishing from ERP transactions?
|
+-- Do you control the database schema?
| |
| +-- YES (custom DB, self-managed ERP)
| | +-- Need sub-second latency?
| | | +-- YES --> Debezium CDC relay with outbox table
| | | +-- NO --> Polling publisher (simpler, fewer moving parts)
| | +-- Already running Kafka?
| | +-- YES --> Debezium outbox event router (purpose-built)
| | +-- NO --> Polling publisher to SQS/RabbitMQ/Service Bus
| |
| +-- NO (SaaS ERP: Salesforce, NetSuite, D365)
| +-- Salesforce --> Platform Events + CDC (native outbox equivalent)
| +-- SAP S/4HANA Cloud --> Event Mesh + Business Event Handling
| +-- NetSuite --> SuiteScript afterSubmit + custom outbox record
| +-- D365 F&O --> Business Events + Azure Service Bus
| +-- D365 CE/Dataverse --> Business Events + Power Automate/Webhooks
|
+-- Pattern comparison:
+-- Need domain-specific events? --> Outbox pattern
+-- Need all row changes captured? --> CDC (without outbox)
+-- Cannot modify application code? --> CDC on existing tables
+-- Need audit trail of all state? --> Event sourcing
| Column | Type | Purpose | Notes |
|---|---|---|---|
id | UUID | Unique event identifier | PK, used for consumer deduplication |
sequence_id | BIGSERIAL | Relay ordering | Monotonically increasing; DO NOT use created_at |
aggregate_type | VARCHAR(255) | Entity type (SalesOrder, Invoice) | Used for topic routing |
aggregate_id | UUID | Business entity ID | Kafka partition key; ensures per-entity ordering |
event_type | VARCHAR(255) | Domain event name (OrderCreated) | Consumer uses this for deserialization |
payload | JSONB | Serialized event data | Keep under 1MB for Kafka compatibility |
created_at | TIMESTAMPTZ | Event timestamp | NOT used for ordering |
published_at | TIMESTAMPTZ | When relay published event | NULL = unpublished |
| Property | Default | Purpose |
|---|---|---|
transforms.outbox.type | io.debezium.transforms.outbox.EventRouter | Enable outbox SMT |
route.by.field | aggregatetype | Column that determines Kafka topic |
route.topic.replacement | outbox.event.${routedByValue} | Topic naming pattern |
table.field.event.id | id | Unique event ID column |
table.field.event.key | aggregateid | Kafka message key column |
table.field.event.payload | payload | Event data column |
table.expand.json.payload | false | Expand JSON strings in payload |
Create the outbox table in the same database as your business data. Use BIGSERIAL for reliable ordering and a partial index on unpublished rows for efficient polling. [src4]
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sequence_id BIGSERIAL NOT NULL,
aggregate_type VARCHAR(255) NOT NULL,
aggregate_id UUID NOT NULL,
event_type VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
published_at TIMESTAMPTZ
);
-- Partial index: relay queries ONLY unpublished rows
CREATE INDEX idx_outbox_unpublished
ON outbox_events (sequence_id)
WHERE published_at IS NULL;
-- Cleanup job index
CREATE INDEX idx_outbox_published_at
ON outbox_events (published_at)
WHERE published_at IS NOT NULL;
Verify: SELECT COUNT(*) FROM outbox_events WHERE published_at IS NULL; --> expected: 0
The critical guarantee: business data and outbox event are committed atomically. If either fails, both roll back. [src1]
import uuid, json
from datetime import datetime
def create_order_with_event(conn, order_data):
"""Write order + outbox event in single transaction."""
order_id = uuid.uuid4()
event_id = uuid.uuid4()
with conn.cursor() as cur:
cur.execute("""
INSERT INTO sales_orders (id, customer_id, total, status)
VALUES (%s, %s, %s, 'created')
""", (str(order_id), order_data['customer_id'], order_data['total']))
cur.execute("""
INSERT INTO outbox_events (id, aggregate_type, aggregate_id,
event_type, payload)
VALUES (%s, %s, %s, %s, %s)
""", (str(event_id), 'SalesOrder', str(order_id), 'OrderCreated',
json.dumps({
'order_id': str(order_id),
'customer_id': order_data['customer_id'],
'total': str(order_data['total']),
'timestamp': datetime.utcnow().isoformat()
})))
conn.commit()
return order_id
Verify: SELECT COUNT(*) FROM outbox_events WHERE published_at IS NULL; --> expected: 1
Background process polls outbox, publishes to broker, marks as published. Use FOR UPDATE SKIP LOCKED for safe concurrent relay instances. [src4]
from confluent_kafka import Producer
import json, time
def poll_and_publish(conn, producer, batch_size=100, poll_interval=2.0):
while True:
with conn.cursor() as cur:
cur.execute("""
SELECT id, aggregate_type, aggregate_id, event_type, payload
FROM outbox_events WHERE published_at IS NULL
ORDER BY sequence_id ASC LIMIT %s
FOR UPDATE SKIP LOCKED
""", (batch_size,))
rows = cur.fetchall()
if not rows:
time.sleep(poll_interval)
continue
published_ids = []
for row in rows:
event_id, agg_type, agg_id, event_type, payload = row
producer.produce(
topic=f"outbox.event.{agg_type}",
key=str(agg_id),
value=json.dumps(payload),
headers={'event_id': str(event_id), 'event_type': event_type}
)
published_ids.append(str(event_id))
producer.flush()
cur.execute("""
UPDATE outbox_events SET published_at = NOW()
WHERE id = ANY(%s)
""", (published_ids,))
conn.commit()
time.sleep(poll_interval)
Verify: SELECT COUNT(*) FROM outbox_events WHERE published_at IS NULL; --> expected: 0
Use Debezium's outbox event router for sub-second relay without polling overhead. [src2]
{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "erp-db.example.com",
"database.port": "5432",
"database.user": "debezium_replication",
"database.dbname": "erp_production",
"topic.prefix": "erp",
"table.include.list": "public.outbox_events",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "outbox.event.${routedByValue}",
"transforms.outbox.table.expand.json.payload": "true",
"plugin.name": "pgoutput",
"slot.name": "outbox_slot"
}
}
Verify: curl http://kafka-connect:8083/connectors/outbox-connector/status --> expected: {"connector":{"state":"RUNNING"}}
Published events must be cleaned up to prevent unbounded table growth. [src4]
-- Delete published events older than 7 days
DELETE FROM outbox_events
WHERE published_at IS NOT NULL
AND published_at < NOW() - INTERVAL '7 days';
Verify: SELECT pg_size_pretty(pg_total_relation_size('outbox_events')); --> size should remain stable
# Input: Kafka message from outbox relay
# Output: Processed event with duplicate detection
def process_event(conn, message):
"""Idempotent event consumer -- deduplicates by event_id."""
event_id = dict(message.headers()).get('event_id', b'').decode()
payload = json.loads(message.value())
with conn.cursor() as cur:
cur.execute("SELECT 1 FROM processed_events WHERE event_id = %s", (event_id,))
if cur.fetchone():
return # Skip duplicate
handle_order_created(payload)
cur.execute(
"INSERT INTO processed_events (event_id, processed_at) VALUES (%s, NOW())",
(event_id,))
conn.commit()
// Input: Order data + Knex transaction
// Output: Order + outbox event committed atomically
const { v4: uuidv4 } = require('uuid'); // [email protected]
async function createOrderWithOutbox(knex, orderData) {
return knex.transaction(async (trx) => {
const orderId = uuidv4();
await trx('sales_orders').insert({
id: orderId, customer_id: orderData.customerId,
total: orderData.total, status: 'created'
});
await trx('outbox_events').insert({
id: uuidv4(), aggregate_type: 'SalesOrder',
aggregate_id: orderId, event_type: 'OrderCreated',
payload: JSON.stringify({
order_id: orderId, customer_id: orderData.customerId,
total: orderData.total
})
});
return orderId;
});
}
// Platform Events participate in trigger transaction context
// If the transaction rolls back, events are NOT published
trigger OpportunityClosedWon on Opportunity (after update) {
List<Order_Event__e> events = new List<Order_Event__e>();
for (Opportunity opp : Trigger.new) {
Opportunity oldOpp = Trigger.oldMap.get(opp.Id);
if (opp.StageName == 'Closed Won' && oldOpp.StageName != 'Closed Won') {
events.add(new Order_Event__e(
Order_Id__c = opp.Id, Account_Id__c = opp.AccountId,
Amount__c = opp.Amount, Event_Type__c = 'OpportunityClosedWon'
));
}
}
if (!events.isEmpty()) {
List<Database.SaveResult> results = EventBus.publish(events);
}
}
| ERP Source Event | aggregate_type | event_type | Key Payload Fields | Gotcha |
|---|---|---|---|---|
| Salesforce Opportunity closed-won | SalesOrder | OrderCreated | opportunity_id, account_id, amount, currency | Amount in org's default currency unless multi-currency |
| SAP Sales Order (VA01) | SalesOrder | SAPOrderCreated | VBELN, KUNNR, NETWR, WAERK | Amounts in smallest currency unit (cents) |
| NetSuite Sales Order created | SalesOrder | NSOrderCreated | internalid, entity, total, subsidiary | Entity ID is subsidiary-scoped in multi-sub orgs |
| D365 Sales Order confirmed | SalesOrder | D365OrderConfirmed | SalesOrderNumber, OrderAccount, InvoiceAccount | OrderAccount vs InvoiceAccount can differ |
0000001234.56) -- parse and normalize before inserting into outbox payload.2026-03-07T14:30:00+01:00.| Scenario | Impact | Resolution |
|---|---|---|
| Relay crashes after Kafka publish, before marking row published | Duplicate event on next relay cycle | Consumer idempotency (deduplication by event_id) |
| Database transaction timeout during outbox write | Both business data and event rolled back (safe) | Retry the business operation |
| Kafka broker unavailable during relay | Events accumulate in outbox table | Relay retries with exponential backoff; alert if outbox depth > threshold |
| Debezium replication slot dropped | Events between slot drop and recreation lost permanently | Monitor WAL lag; alert on slot state changes |
| Consumer processing fails | Event stays in consumer's dead letter queue | DLQ with manual review + retry mechanism |
| Outbox table bloat (no cleanup) | Increasing query latency, disk usage | Scheduled cleanup job (delete published events > retention window) |
monitor pg_replication_slots.active and pg_wal_lsn_diff(); alert when lag exceeds 100MB. [src2]FOR UPDATE SKIP LOCKED process same events, causing duplicates. Fix: always use FOR UPDATE SKIP LOCKED. [src4]set autovacuum_vacuum_scale_factor = 0.01 on the outbox table. [src4]check Database.SaveResult errors and implement fallback logging. [src6]idempotent consumer with processed_events table. [src3]# BAD -- event can be lost if publish fails after commit
def create_order_bad(conn, producer, order_data):
order_id = save_order_to_db(conn, order_data) # DB commit
producer.produce('orders', json.dumps({'order_id': order_id}))
# If this fails (network error, broker down), event is LOST
# GOOD -- event is guaranteed to be published eventually
def create_order_good(conn, order_data):
with conn.cursor() as cur:
order_id = uuid.uuid4()
cur.execute("INSERT INTO sales_orders ...", (order_id, ...))
cur.execute("INSERT INTO outbox_events ...", (order_id, 'OrderCreated', ...))
conn.commit() # Atomic -- both or neither
# BAD -- if publish succeeds but transaction rolls back,
# consumers act on phantom events
def create_order_phantom(conn, producer, order_data):
conn.begin()
order_id = insert_order(conn, order_data)
producer.produce('orders', json.dumps({'order_id': order_id}))
conn.commit() # If this fails, event was already published!
# GOOD -- application writes to outbox only
def create_order_correct(conn, order_data):
conn.begin()
order_id = insert_order(conn, order_data)
insert_outbox_event(conn, 'OrderCreated', order_id, order_data)
conn.commit()
# Relay process handles broker publishing separately
-- BAD -- concurrent transactions produce out-of-order timestamps
SELECT * FROM outbox_events WHERE published_at IS NULL
ORDER BY created_at ASC;
-- GOOD -- BIGSERIAL guarantees monotonic ordering
SELECT * FROM outbox_events WHERE published_at IS NULL
ORDER BY sequence_id ASC
FOR UPDATE SKIP LOCKED;
every consumer must track processed event_ids and skip duplicates. [src1]FOR UPDATE SKIP LOCKED in every poll query. [src4]outbox table MUST be in the same database as business data. [src1]monitor pg_replication_slots, alert on inactive slots and WAL lag > 100MB. [src2]publish only changed fields + entity ID; let consumers fetch full state if needed. [src4]scheduled DELETE of published events older than retention window (7 days). [src4]# Check outbox depth (unpublished events) -- should be near 0
psql -c "SELECT COUNT(*) as pending, MIN(created_at) as oldest_pending FROM outbox_events WHERE published_at IS NULL;"
# Check outbox table size
psql -c "SELECT pg_size_pretty(pg_total_relation_size('outbox_events'));"
# Check Debezium connector status
curl -s http://kafka-connect:8083/connectors/outbox-connector/status | jq '.connector.state'
# Check PostgreSQL replication slot lag
psql -c "SELECT slot_name, active, pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) as lag_bytes FROM pg_replication_slots;"
# Check Kafka consumer group lag
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group outbox-consumer --describe
# Check Salesforce Platform Event usage
curl -H "Authorization: Bearer $SF_TOKEN" \
"https://yourorg.my.salesforce.com/services/data/v62.0/limits" | jq '.DailyStandardVolumePlatformEvents'
# Monitor relay throughput (events/minute)
psql -c "SELECT date_trunc('minute', published_at) as minute, COUNT(*) FROM outbox_events WHERE published_at > NOW() - INTERVAL '1 hour' GROUP BY 1 ORDER BY 1;"
| Component | Version | Status | Notes |
|---|---|---|---|
| Debezium Outbox Event Router | 2.x | Current (GA) | Stable since 1.9; MongoDB requires MongoEventRouter |
| Debezium Outbox Event Router | 1.x | Deprecated | Upgrade to 2.x for improved JSON expansion |
| Salesforce Platform Events | API v62.0 (Spring '26) | Current | High-volume events GA since Winter '22 |
| Salesforce CDC | API v62.0 (Spring '26) | Current | Enriched change events since Summer '23 |
| SAP Event Mesh | BTP 2024 | Current | Advanced Event Mesh is successor (Solace-based) |
| D365 Business Events | 2024 Release Wave 2 | Current | Available in F&O and CE/Dataverse |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Business transaction must guarantee downstream event delivery | Simple request-reply with no delivery guarantee needed | Direct REST API call with retry |
| Source system uses a relational database | Source is a SaaS ERP with no DB access | ERP-native events (Platform Events, SuiteScript) |
| Need domain events (OrderCreated, InvoicePosted) | Need ALL row-level changes regardless of business meaning | CDC without outbox (Debezium on business tables) |
| Multiple consumers need the same event (fan-out) | Single consumer, point-to-point integration | Direct API call or simple queue |
| Already running Kafka and want sub-second relay | No Kafka and no plans to adopt it | Polling relay to SQS or RabbitMQ |
| Need event replay for debugging or rebuilding state | Events are ephemeral notifications with no replay value | Fire-and-forget webhooks |
| Capability | Outbox + Polling | Outbox + Debezium CDC | Dual-Write (no outbox) | CDC on Business Tables | Event Sourcing |
|---|---|---|---|---|---|
| Atomicity | Guaranteed (same DB txn) | Guaranteed (same DB txn) | NOT guaranteed | N/A (reads existing txn log) | Guaranteed (event IS the write) |
| Delivery guarantee | At-least-once | At-least-once | At-most-once | At-least-once | At-least-once |
| Latency | 1-5s (poll interval) | Sub-second | Near real-time | Sub-second | Sub-second |
| DB load | Moderate (polling) | Minimal (reads WAL) | None | Minimal (reads WAL) | None |
| Infrastructure | Background worker | Kafka + Kafka Connect | None additional | Kafka + Kafka Connect | Event store |
| Event semantics | Domain events (explicit) | Domain events (explicit) | App-level (fragile) | Row-level (implicit) | Domain events (inherent) |
| Code changes | Write to outbox | Write to outbox | Publish to broker | NONE (non-intrusive) | Complete rewrite |
| Ordering | Per-aggregate | Per-partition | No guarantee | Per-table | Per-aggregate |
| Replay | Yes (retain rows) | Yes (Kafka retention) | No | Yes (Kafka retention) | Yes (event store) |
| Best for | Simple setups, no Kafka | Production-grade, Kafka | Prototype only | Legacy, no code changes | Greenfield, complex domain |