Change Data Capture (CDC) for ERP Integration — Debezium, GoldenGate, and Cloud ERP Patterns
Type: ERP Integration
Systems: Debezium 3.4.x, Oracle GoldenGate 23ai, Salesforce CDC v66.0, SAP ODP/SLT
Confidence: 0.88
Sources: 7
Verified: 2026-03-02
Freshness: evolving
TL;DR
- Bottom line: Log-based CDC (Debezium, GoldenGate) is the gold standard for on-premise ERP databases — reads transaction logs with near-zero impact. Cloud SaaS ERPs (Salesforce, Workday, NetSuite) block log access entirely; you must use vendor-native event streams or API-based CDC instead.
- Key limit: Cloud ERPs cannot do log-based CDC — Salesforce retains CDC events for only 3 days, SAP requires SLT proxy for table-level CDC, and NetSuite/Workday offer no native CDC at all.
- Watch out for: Attempting log-based CDC on a SaaS ERP you do not control — it is architecturally impossible. Also, GoldenGate licensing ($17,500/processor on both source AND target) catches teams off guard.
- Best for: Real-time, event-driven replication from ERP databases where sub-second latency and zero-loss delivery matter more than implementation simplicity.
- Authentication: N/A at CDC level — Debezium uses DB credentials; GoldenGate uses OS-level DB access; Salesforce CDC uses OAuth 2.0 via Pub/Sub API.
System Profile
This card covers Change Data Capture as an integration architecture pattern across multiple ERP systems. It compares log-based CDC tools (Debezium, Oracle GoldenGate), vendor-native CDC (Salesforce CDC, SAP ODP/SLT), and explains why cloud SaaS ERPs fundamentally limit what CDC methods are available. CDC method availability depends entirely on whether you have database-level access.
| System | CDC Method Available | Tool | DB Access Required? | Latency |
| Oracle EBS (on-prem) | Log-based (redo logs) | GoldenGate, Debezium | Yes — supplemental logging | Sub-second |
| SAP S/4HANA (on-prem) | Log-based + ODP | SLT, Debezium, ADF SAP CDC | Yes (DB) or No (ODP via RFC) | Seconds |
| SAP S/4HANA Cloud | ODP only | Azure Data Factory SAP CDC | No — ODP/RFC only | Seconds to minutes |
| Salesforce | Vendor-native event stream | Salesforce CDC (Pub/Sub API) | No — no DB access | Sub-second |
| NetSuite | None native; API polling only | Custom (SuiteTalk/REST) | No — no DB access | Minutes |
| Workday | None native; API-based only | Workday RaaS | No — no DB access | Minutes to hours |
| Dynamics 365 | Dataverse Change Tracking | Dataverse API / Azure Synapse Link | No — API-level only | Minutes |
| Custom ERP (PostgreSQL) | Log-based (WAL) | Debezium | Yes — logical replication | Sub-second |
| Custom ERP (MySQL) | Log-based (binlog) | Debezium | Yes — binlog access | Sub-second |
API Surfaces & Capabilities
| CDC Tool/Method | Protocol | Best For | Max Throughput | Latency | Open Source? | Cost |
| Debezium (Kafka Connect) | Kafka / HTTP (Server) | On-prem DB CDC at scale | 100K+ events/sec | Sub-second | Yes (Apache 2.0) | Infrastructure only |
| Oracle GoldenGate | Trail files / REST API | Oracle-to-Oracle, high-volume | 100K+ events/sec | Sub-second | No | $17,500/processor |
| Salesforce CDC | Pub/Sub gRPC / CometD | SF record change tracking | Edition-dependent | Sub-second | N/A (platform feature) | Included in Enterprise+ |
| SAP ODP/SLT | RFC / OData | SAP table and CDS extraction | SLT sizing dependent | Seconds | No | SAP licensing |
| ADF SAP CDC | ODP via RFC | SAP-to-Azure delta extraction | IR sizing dependent | Minutes | No | ADF pricing |
| Query-based polling | REST/SOAP API | Simple, low-volume, any ERP | API rate limited | Minutes to hours | N/A | API call costs |
| Trigger-based CDC | DB triggers | Legacy, no log access | Low (trigger overhead) | Seconds | N/A | Dev cost |
Rate Limits & Quotas
Debezium Limits
| Limit Type | Value | Applies To | Notes |
| Max connectors per cluster | No hard limit | Kafka Connect | Bounded by cluster resources |
| Kafka message max size | 1 MB default | All connectors | Configure max.message.bytes for large rows |
| Heartbeat interval | 300,000 ms default | All connectors | Reduce for low-traffic tables |
| Oracle LogMiner batch | 10,000 rows default | Oracle connector | log.mining.batch.size.default |
| Slot replication lag | Monitor required | PostgreSQL | Unbounded WAL growth if behind |
Oracle GoldenGate Limits
| Limit Type | Value | Applies To | Notes |
| Trail file size | 2 GB default | Extract/Replicat | Configurable; split large transactions |
| Supplemental logging overhead | 5-15% write increase | Source Oracle DB | Required — cannot be avoided |
| Max transaction size | Limited by trail disk | Large batch ops | >10M row transactions need tuning |
Salesforce CDC Limits
| Limit Type | Value | Window | Edition Differences |
| Event retention | 3 days | Rolling | Same across all editions |
| Max entities per channel | 5 (custom channel) | Per channel | Standard channel covers all |
| CometD connections | Edition-based | Concurrent | Enterprise: 2,000 clients |
SAP ODP/SLT Limits
| Limit Type | Value | Applies To | Notes |
| ODP delta queue retention | Configurable (default 24h) | All subscribers | Old deltas purged after consumption |
| SLT replication tables | Resource bound | SLT server | Each table requires logging table |
Authentication
| Tool | Auth Method | Credentials | Refresh? | Notes |
| Debezium (PostgreSQL) | DB user + REPLICATION role | Username/password or SSL | N/A | Dedicated replication user |
| Debezium (MySQL) | REPLICATION SLAVE + CLIENT | Username/password | N/A | Also needs SELECT |
| Debezium (Oracle) | LogMiner privileges | Username/password | N/A | V$LOG, V$LOGFILE access |
| Oracle GoldenGate | OS + GG credential store | DB + GG admin creds | N/A | Extract runs as OS user |
| Salesforce CDC | OAuth 2.0 (JWT/Web Server) | Connected App + cert | Yes (2h) | Pub/Sub API or CometD |
| SAP ODP (via ADF) | SAP RFC user | SAP username/password | N/A | Self-hosted IR required |
Authentication Gotchas
- Debezium PostgreSQL: replication user needs REPLICATION role AND LOGIN privilege. Abandoned slots cause WAL disk exhaustion. [src1]
- GoldenGate: forgetting table-level supplemental logging causes silent data loss — updates captured without full column context. [src3]
- Salesforce: OAuth tokens expire; Pub/Sub subscriber disconnection risks data loss beyond 3-day retention window. [src5]
Constraints
- Cloud SaaS ERPs block log-based CDC entirely: Salesforce, Workday, NetSuite do not expose transaction logs. Debezium/GoldenGate are architecturally impossible.
- GoldenGate dual-processor licensing: License required on BOTH source and target processors. 8 processors = $140,000 list price before support.
- Salesforce CDC 3-day retention: Events not consumed within 3 days are permanently deleted. Downtime > 3 days requires full re-sync.
- Debezium requires Kafka or Debezium Server: Minimum viable Kafka cluster: 3 brokers + KRaft. Debezium Server needs a sink target.
- Oracle supplemental logging overhead: 5-15% redo log volume increase. Impacts I/O and archive storage on high-write databases.
- SAP SLT licensing: SLT is a separate product — not included in base S/4HANA license. Required for application table CDC.
- PostgreSQL WAL retention risk: Debezium replication slot prevents WAL cleanup. Connector downtime can fill disk and crash the database.
Integration Pattern Decision Tree
START — Need CDC from an ERP system
├── Do you have direct database access?
│ ├── YES (on-premise / IaaS with DB admin)
│ │ ├── Oracle → Debezium (LogMiner) or GoldenGate
│ │ │ ├── Budget allows $17,500+/processor? → GoldenGate (best Oracle native)
│ │ │ └── No / heterogeneous targets → Debezium (free, Kafka ecosystem)
│ │ ├── PostgreSQL → Debezium (logical replication, WAL)
│ │ ├── MySQL/MariaDB → Debezium (binlog)
│ │ ├── SQL Server → Debezium (SQL Server CDC feature)
│ │ └── Db2 → Debezium (Db2 connector)
│ └── NO (SaaS / cloud ERP, no DB access)
│ ├── Salesforce → Salesforce CDC (Pub/Sub API)
│ ├── SAP S/4HANA Cloud → ODP via ADF SAP CDC connector
│ ├── SAP ECC → ODP via SLT (requires SLT license)
│ ├── NetSuite → API polling (no native CDC)
│ ├── Workday → RaaS polling or Integration Cloud
│ └── Dynamics 365 → Dataverse Change Tracking API
└── Error tolerance?
├── Zero-loss → Exactly-once (Debezium 3.3+) + dead letter queue
└── At-least-once → Default Debezium / GoldenGate behavior
Quick Reference
CDC Method Comparison
| Method | Mechanism | Latency | DB Impact | Deletes? | Complexity | Cost |
| Log-based (Debezium) | Transaction log tailing | Sub-second | Minimal | Yes | Medium-High | Free + Kafka infra |
| Log-based (GoldenGate) | Oracle redo log mining | Sub-second | Minimal | Yes | High | $17,500/processor |
| Vendor-native (SF CDC) | Platform event bus | Sub-second | None | Yes | Low | Included |
| Vendor-native (SAP ODP) | ODP delta queue | Seconds | Low | Yes | Medium | SAP + SLT license |
| Query-based polling | Timestamp/ID filter | Minutes | High | NO | Low | API call costs |
| Trigger-based | DB triggers + shadow table | Seconds | HIGH | Yes | Medium | Dev cost |
Tool Selection Matrix
| Factor | Debezium | GoldenGate | Salesforce CDC | SAP ODP/SLT | Polling |
| License cost | Free (Apache 2.0) | $17,500/processor | Included | SAP license | Free |
| Infrastructure | Kafka cluster | GG hub | None (SaaS) | SLT server | None |
| Supported sources | 11 databases | Oracle primary | Salesforce only | SAP only | Any with API |
| Exactly-once | Yes (v3.3+) | Bounded recovery | At-most-once | At-least-once | N/A |
| Schema evolution | Auto-detect | DDL replication | Automatic | CDS-dependent | N/A |
| Operational complexity | Medium | High | Low | Medium | Low |
Step-by-Step Integration Guide
1. Set up Debezium for PostgreSQL-backed ERP
Deploy Debezium via Kafka Connect to capture changes from a PostgreSQL-backed ERP database. [src1]
# Enable logical replication: wal_level=logical, max_replication_slots=4
psql -c "CREATE ROLE debezium_user WITH REPLICATION LOGIN PASSWORD 'secure_password';"
psql -c "GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium_user;"
Verify: psql -c "SHOW wal_level;" → expected: logical
2. Deploy Debezium PostgreSQL connector
Register the connector with Kafka Connect REST API. [src1]
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{"name":"erp-postgres-cdc","config":{
"connector.class":"io.debezium.connector.postgresql.PostgresConnector",
"database.hostname":"erp-db.internal","database.port":"5432",
"database.user":"debezium_user","database.password":"secure_password",
"database.dbname":"erp_production","topic.prefix":"erp",
"table.include.list":"public.orders,public.customers,public.invoices",
"slot.name":"debezium_erp","plugin.name":"pgoutput",
"snapshot.mode":"initial","heartbeat.interval.ms":"30000"}}'
Verify: curl http://localhost:8083/connectors/erp-postgres-cdc/status → expected: "state":"RUNNING"
3. Consume CDC events from Kafka
Read change events from Debezium-created Kafka topics. [src1]
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic erp.public.orders --from-beginning --max-messages 5
Verify: Output contains JSON with "op": "c" (create), "op": "u" (update), or "op": "d" (delete).
4. Monitor replication slot health
Prevent WAL disk exhaustion by monitoring replication slot lag. [src1]
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size
FROM pg_replication_slots WHERE slot_name = 'debezium_erp';
Verify: lag_size should be < 1 GB under normal operation.
Code Examples
Python: Subscribe to Salesforce CDC events via Pub/Sub API
# Input: Salesforce Connected App credentials
# Output: Stream of change events for subscribed Salesforce objects
import requests, json
def get_salesforce_token(client_id, client_secret, username, password, security_token):
resp = requests.post("https://login.salesforce.com/services/oauth2/token", data={
"grant_type": "password", "client_id": client_id,
"client_secret": client_secret,
"username": username, "password": password + security_token
})
resp.raise_for_status()
data = resp.json()
return data["access_token"], data["instance_url"]
def process_change_event(event):
header = event.get("ChangeEventHeader", {})
return {
"entity": header.get("entityName"),
"operation": header.get("changeType"),
"record_ids": header.get("recordIds"),
"changed_fields": header.get("changedFields"),
}
JavaScript/Node.js: Debezium CDC event consumer via Kafka
// Input: Kafka cluster, Debezium topic
// Output: Processed change events with operation type
const { Kafka } = require('kafkajs'); // [email protected]
const kafka = new Kafka({ clientId: 'erp-cdc', brokers: ['kafka:9092'] });
const consumer = kafka.consumer({ groupId: 'erp-sync-group' });
async function consumeCDCEvents() {
await consumer.connect();
await consumer.subscribe({ topics: [/^erp\.public\./], fromBeginning: false });
await consumer.run({
eachMessage: async ({ topic, message }) => {
const value = JSON.parse(message.value.toString());
const { op, before, after, source } = value;
// op: 'c'=create, 'u'=update, 'd'=delete, 'r'=snapshot
if (op === 'd') await deleteFromTarget(source.table, before);
else await upsertToTarget(source.table, after);
},
});
}
cURL: Check Debezium connector status
# List all connectors
curl -s http://localhost:8083/connectors | jq .
# Check specific connector
curl -s http://localhost:8083/connectors/erp-postgres-cdc/status | jq .
# Pause / Resume / Delete
curl -X PUT http://localhost:8083/connectors/erp-postgres-cdc/pause
curl -X PUT http://localhost:8083/connectors/erp-postgres-cdc/resume
# WARNING: Delete does NOT drop replication slot
curl -X DELETE http://localhost:8083/connectors/erp-postgres-cdc
Data Mapping
Debezium Event Structure
| Field | Type | Description | Gotcha |
op | String (c/u/d/r) | Operation type | Snapshot events use 'r', not 'c' |
before | Object | Record state BEFORE change | Needs REPLICA IDENTITY FULL for PG |
after | Object | Record state AFTER change | Null for deletes |
source.ts_ms | Long | Source DB timestamp | Use for ordering, NOT Kafka timestamp |
source.lsn | Long | Log Sequence Number | Monotonic — use for deduplication |
ts_ms | Long | Debezium processing time | Delta from source.ts_ms = replication lag |
Data Type Gotchas
- Debezium serializes DECIMAL as bytes (Avro) or base64 (JSON) by default — set
decimal.handling.mode=string. [src1]
- Timestamp timezone: Debezium emits UTC; Oracle may store local timezone. MySQL TIMESTAMP auto-converts but DATETIME does not. [src1]
- Salesforce compound fields (address, geolocation) emit as nested objects, breaking flat-table mapping. [src5]
Error Handling & Failure Points
Common Error Codes
| Error | Source | Cause | Resolution |
replication slot already exists | Debezium/PG | Connector re-created without dropping slot | SELECT pg_drop_replication_slot('slot_name'); |
ORA-01291: missing logfile | Debezium/Oracle or GG | Archive log purged before CDC read | Increase archive log retention |
INSUFFICIENT_ACCESS | Salesforce CDC | Missing CDC permission | Assign "Subscribe to CDC Events" perm set |
Task is in FAILED state | Debezium | Connector task crashed | Check status endpoint; restart task |
WAL segment removed | Debezium/PG | Slot fell too far behind | Full re-snapshot required |
GG-01031 Extract abended | GoldenGate | Extract process crashed | Check error log; START EXTRACT |
Failure Points in Production
- PostgreSQL WAL disk exhaustion: Debezium replication slot prevents WAL cleanup. Fix:
Monitor pg_replication_slots lag; set max_slot_wal_keep_size (PG 13+); alert at 50% disk. [src1]
- Oracle supplemental logging disabled on new table: Silent data corruption. Fix:
ALTER TABLE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS alongside DDL. [src3]
- Salesforce subscriber falls behind 3-day window: Lost events, no replay. Fix:
Persist CDC events to Kafka immediately; implement gap-triggered full re-sync. [src5]
- Schema change breaks Avro serialization: Non-nullable column without default. Fix:
Use BACKWARD compatible Schema Registry; add columns as nullable. [src1]
- GoldenGate trail file disk full: Replicat falls behind Extract. Fix:
Monitor trail lag; set PURGEOLDEXTRACTS; alert when lag > 1 hour. [src3]
- SAP ODP delta queue overflow: Subscriber too slow. Fix:
Configure queue retention in ODQMON; increase pipeline frequency. [src4]
Anti-Patterns
Wrong: Polling an ERP database on a timer
# BAD — Constant load, misses deletes, timestamp gaps lose data
def poll_for_changes():
cur.execute("SELECT * FROM orders WHERE updated_at > NOW() - INTERVAL '60s'")
# Misses deletes, adds constant query load, 60s latency minimum
Correct: Log-based CDC with Debezium
# GOOD — Zero query load, captures deletes, sub-second latency
for message in kafka_consumer:
event = message.value
op = event['op'] # c=create, u=update, d=delete
if op in ('c', 'u', 'r'): sync_to_target(event['after'])
elif op == 'd': delete_from_target(event['before'])
Wrong: Using GoldenGate for non-Oracle targets
# BAD — 8 processors * $17,500 = $140,000 + 22% annual support
# Paying Oracle license for non-Oracle target database
Correct: Debezium for heterogeneous CDC
# GOOD — Oracle -> Debezium (LogMiner) -> Kafka -> JDBC Sink -> PostgreSQL
# Total license cost: $0 (infrastructure costs only)
Wrong: Treating Salesforce CDC as durable event stream
# BAD — Events retained 3 days only; downtime > 3 days = data loss
subscribe_to("/data/AccountChangeEvent") # No durability guarantee
Correct: Buffer Salesforce CDC into Kafka
# GOOD — Persist to Kafka immediately for infinite retention
for event in sf_cdc_events:
kafka_producer.send("sf-account-changes", event)
# Now events durable beyond 3-day SF retention
Common Pitfalls
- Orphaned replication slots: Deleting Debezium connector does NOT drop the PG replication slot. Fix:
Always pg_drop_replication_slot() after deletion. [src1]
- Initial snapshot overwhelms production: Reads all data, locks tables (MySQL global read lock). Fix:
Use snapshot.mode=exported (PG, non-blocking) during off-peak. [src1]
- Not testing at production volume: Works at 1K rows, fails at 100M. Fix:
Load-test with production-scale data; size Kafka partitions for peak change rate. [src1]
- Schema evolution without compatibility: Non-nullable column breaks Avro. Fix:
Use Schema Registry BACKWARD compatibility; add columns as nullable. [src1]
- GoldenGate non-prod licensing: Licenses apply to dev/staging/prod. Fix:
Include all environments in license count or use Debezium for non-prod. [src2]
- SAP ODP without SLT for tables: Application table CDC requires SLT proxy. Fix:
Deploy SLT for table CDC; use direct ODP for CDS views. [src4]
Diagnostic Commands
# Check all Debezium connector statuses
curl -s http://localhost:8083/connectors | jq -r '.[]' | while read c; do
curl -s "http://localhost:8083/connectors/$c/status" | jq '{name: "'$c'", state: .connector.state}'
done
# PostgreSQL replication slot health
psql -c "SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag FROM pg_replication_slots;"
# Kafka consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group erp-sync-group --describe
# Salesforce CDC event delivery check
curl -s "https://instance.salesforce.com/services/data/v66.0/limits" \
-H "Authorization: Bearer $TOKEN" | jq '.DailyDeliveredPlatformEvents'
# SAP: Transaction ODQMON (delta queues), SLT_DASHBOARD (replication status)
Version History & Compatibility
| Tool | Version | Release | Status | Key Changes |
| Debezium | 3.4.0 | 2025-12 | Current | MariaDB GA, CockroachDB incubating |
| Debezium | 3.3.0 | 2025-10 | Supported | Exactly-once for all core connectors |
| Debezium | 3.0.0 | 2024-10 | Supported | Java 17+, Kafka 3.x baseline |
| Debezium | 2.7.x | 2024-06 | EOL | Last Java 11 compatible |
| GoldenGate | 23ai | 2024 | Current | Microservices architecture, REST API |
| GoldenGate | 21c | 2021 | Supported | Classic + Microservices |
| Salesforce CDC | API v66.0 | 2026-02 | Current | Spring '26; Pub/Sub API preferred |
| ADF SAP CDC | 2025-02 | 2025-02 | GA | ODP framework, CDS views, SLT |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
| Real-time replication from on-prem ERP with DB access | SaaS ERP with no DB access | API polling with timestamp filters |
| Need to capture DELETEs | Only need INSERT/UPDATE tracking | Timestamp-based polling |
| High change volume (>10K/day) | Low volume (<100/day) | Simple REST API polling |
| Kafka already deployed | No Kafka and no budget for it | Debezium Server to Kinesis/Pub/Sub |
| Oracle-to-Oracle with Oracle support needed | Heterogeneous targets | Debezium (free, multi-target) |
| Salesforce record change tracking | Need >3 day replay window | Buffer SF CDC to Kafka |
Cross-System Comparison
| Capability | Debezium | GoldenGate | Salesforce CDC | SAP ODP/SLT | Polling |
| CDC Method | Log-based | Log-based | Vendor events | App-layer delta | Query-based |
| Latency | Sub-second | Sub-second | Sub-second | Seconds | Minutes |
| Captures Deletes | Yes | Yes | Yes | Yes | No |
| Delivery | Exactly-once (3.3+) | At-least-once | At-most-once | At-least-once | Best-effort |
| DB Impact | Minimal | Minimal | None | Low | High |
| License Cost | Free | $17,500/proc | Included | SAP + SLT | Free |
| Sources | 11 databases | Oracle primary | Salesforce only | SAP only | Any with API |
| Infrastructure | Kafka cluster | GG hub | None | SLT server | None |
| Event Retention | Configurable | Disk-bound | 3 days | Configurable | N/A |
| Multi-Target | Native (Kafka) | Limited | Limited | One per queue | N/A |
Important Caveats
- Cloud ERP CDC gap is widening — more SaaS migrations mean fewer systems where log-based CDC is possible. Each vendor has unique event APIs with no standardized interface.
- Debezium is not zero-ops — requires Kafka cluster management, Schema Registry, connector monitoring, replication slot cleanup, and snapshot planning.
- GoldenGate cost is often underestimated — dual-processor licensing, 22% annual support, and non-production environment requirements can triple expected cost.
- Exactly-once is still young — Debezium 3.3+ requires Kafka transactions on both sides. Most sink connectors do not yet support transactional consumption.
- CDC is not a backup — streams forward from a point in time. Does not replace database backups, PITR, or disaster recovery strategies.
Related Units