Saga Pattern for Distributed Transactions Across ERPs
What is the Saga pattern for distributed transactions across ERPs - orchestration vs choreography?
TL;DR
Bottom line: Use the Saga pattern when a business transaction spans multiple ERPs (e.g., order-to-cash across Salesforce + SAP + a payment gateway) and you cannot use a traditional two-phase commit. Orchestration is the default choice for ERP integrations; choreography only works for simple 2-3 system flows.
Key limit: Sagas provide eventual consistency, not ACID isolation — concurrent sagas can produce dirty reads, lost updates, and nonrepeatable reads. Design compensating transactions for every forward step.
Watch out for: Compensating transactions are NOT true rollbacks — reversing an invoice in SAP or a payment in Stripe creates a new offsetting record, not a deletion. Business stakeholders must approve compensation semantics.
Best for: Long-running, multi-step business processes that cross ERP boundaries — order-to-cash, procure-to-pay, intercompany settlement, multi-system fulfillment.
Authentication: Each ERP participant authenticates independently — the saga orchestrator needs valid credentials (OAuth 2.0, API keys, certificates) for every system in the chain.
System Profile
The Saga pattern is an architecture pattern, not a product. It applies to any distributed transaction that spans multiple ERP systems, middleware platforms, or microservices. This card covers the pattern itself, its two coordination approaches (orchestration and choreography), and practical implementation guidance for cross-ERP business transactions.
System
Role
API Surface
Direction
ERP System A (e.g., Salesforce)
CRM — source order/opportunity
REST API
Outbound
ERP System B (e.g., SAP S/4HANA)
ERP — financial posting, inventory
OData / BAPI
Inbound
Payment Gateway (e.g., Stripe)
Payment processing
REST API
Bidirectional
Orchestrator (e.g., Temporal, Step Functions)
Saga coordinator
SDK / State Machine
Orchestrator
API Surfaces & Capabilities
The Saga pattern does not define a specific API surface — it defines how multiple API surfaces coordinate. The key capability matrix compares the two coordination approaches:
Capability
Orchestration
Choreography
Coordination
Central orchestrator sends commands to participants
Participants publish/subscribe to domain events
Complexity ceiling
Scales to 10+ participants
Practical limit: 3-5 participants
Visibility
Single place to inspect saga state
Must aggregate events across all participants
Coupling
Participants coupled to orchestrator
Participants coupled to event schema
Single point of failure
Yes — orchestrator (mitigated by HA)
No — distributed
Cyclic dependencies
Avoided by design
Risk of circular event chains
Error handling
Orchestrator triggers compensations
Each participant handles failure events independently
Debugging
Straightforward — follow orchestrator logs
Difficult — requires distributed tracing
Testing
Test orchestrator + mock participants
Must run all participants to test end-to-end
New participant
Modify orchestrator workflow
Add new event listener (verify no side effects)
Rate Limits & Quotas
Per-Saga Limits
Saga throughput is constrained by the slowest participant's API limits. A single business transaction may consume multiple API calls per participant:
Saga Step
Typical API Calls
Rate Limit Concern
Mitigation
Create order in CRM
1-3
Salesforce: 100K/24h
Batch via composite API
Reserve inventory in ERP
1-2
SAP: depends on ICM config
Use async BAPI for high volume
Process payment
1-2
Stripe: 100 req/s
Built-in rate limiting
Post financial document
1-3
Oracle: throttled per tenant
Use batch endpoints
Compensate on failure
1 per completed step
Doubles API consumption
Budget 2x API calls for failures
Orchestrator Platform Limits
Platform
Execution Limit
State Size
Timeout
Pricing
AWS Step Functions (Standard)
Unlimited duration
256 KB payload
1 year max
$0.025/1K transitions
AWS Step Functions (Express)
5 minutes max
256 KB payload
5 minutes
Per execution + duration
Temporal Cloud
Unlimited duration
50K events/workflow
No hard timeout
Per action
Azure Durable Functions
Unlimited duration
Storage-backed
No hard timeout
Per execution + compute
Camunda 8
Unlimited duration
1 MB per variable
Configurable
Per execution (cloud)
Authentication
Each saga participant authenticates independently. The orchestrator must maintain valid credentials for every system:
ERP System
Auth Flow
Token Lifetime
Saga Consideration
Salesforce
OAuth 2.0 JWT Bearer
2h (configurable)
Token may expire mid-saga — refresh per step
SAP S/4HANA
OAuth 2.0 / Basic Auth
Session-based
CSRF token required per write operation
Oracle ERP Cloud
OAuth 2.0 / Basic Auth
Session-based
Rate limiting tied to auth identity
NetSuite
Token-Based Auth (TBA)
No expiry
Consumer/token key pair — rotate quarterly
Stripe
API Key (Secret Key)
No expiry
Use restricted keys with minimal permissions
Authentication Gotchas
Token expiry mid-saga: Long-running sagas can outlive access tokens. Refresh tokens before each step, not at saga start. [src3]
Credential scope: Each saga step may need different permission scopes. Use scoped tokens per step where possible. [src4]
CSRF tokens: SAP and some Oracle endpoints require CSRF tokens that expire quickly. Fetch a new one at each write step.
Constraints
No ACID isolation: Concurrent sagas can produce lost updates, dirty reads, and nonrepeatable reads. Implement application-level countermeasures (semantic locks, commutative updates).
Compensating transactions are imperfect: A compensation is a semantic inverse, not a database rollback. Business logic must define what “undo” means for each step.
Compensation must be idempotent: If the compensation message is delivered twice, running it again must not cause double-reversal.
Pivot transaction is the point of no return: Once the pivot succeeds, all subsequent steps must be retryable to completion.
ERP API rate limits compound: A 10-step saga consuming 3 API calls per step = 30 calls per transaction. Factor in compensation (failure path doubles consumption).
Eventual consistency window: Between a forward transaction and its compensation, other systems see inconsistent data.
Integration Pattern Decision Tree
START — Need distributed transaction across multiple ERPs
├── How many systems participate?
│ ├── 2 systems, simple request-reply
│ │ └── SKIP SAGA — use direct API call + retry + DLQ
│ ├── 2-3 systems, linear flow
│ │ ├── Event infrastructure exists? → Consider CHOREOGRAPHY
│ │ └── No event infra? → Use ORCHESTRATION
│ └── 4+ systems or branching/parallel
│ └── Always ORCHESTRATION
├── Consistency requirement?
│ ├── Strong ACID → SAGA IS WRONG PATTERN
│ ├── Eventual consistency → Saga fits
│ └── Best-effort → Saga with relaxed compensation
├── Transaction duration?
│ ├── < 1s → Synchronous orchestration
│ ├── 1s - 5min → Step Functions Express or Temporal
│ └── > 5min → Step Functions Standard, Temporal, Camunda
└── Error tolerance?
├── Zero-loss → Idempotent steps + DLQ + manual review
└── Best-effort → Retry with exponential backoff + alerting
Quick Reference
Saga Transaction Types
Transaction Type
Description
Can Be Undone?
ERP Example
Compensable
Forward step that can be semantically reversed
Yes
Create sales order in SAP (→ cancel order)
Pivot
Point of no return — saga must complete after this
After this: No
Capture payment (→ cannot un-charge, only refund)
Retryable
Steps after pivot that must eventually succeed
N/A (must succeed)
Post revenue recognition, update CRM status
Orchestration vs Choreography Decision Matrix
Criterion
Choose Orchestration
Choose Choreography
Participant count
4+ systems
2-3 systems
Workflow complexity
Branching, parallel, conditional
Linear, sequential
Team structure
Central integration team
Independent service teams
Debugging needs
Must trace full transaction
Per-service debugging sufficient
Failure recovery
Centralized compensation logic
Distributed compensation logic
ERP integrations
Almost always (ERP APIs are complex)
Rarely (ERP complexity demands central control)
Compensating Transaction Reference for Common ERP Operations
Forward Transaction
Compensating Transaction
System
Gotcha
Create Sales Order
Cancel Sales Order + release inventory
SAP S/4HANA
Cancellation creates a new document
Reserve Inventory
Release Reservation
Any ERP
Check if reservation consumed by fulfillment
Authorize Payment
Void Authorization
Stripe/Gateway
Void only works before capture
Post Accounting Entry
Post Reversal Entry
Oracle ERP / SAP
Reversal is a new journal entry
Create Invoice
Issue Credit Memo
Any ERP
Credit memo must reference original invoice
Update Customer Record
Restore Previous Values
Salesforce
Requires storing pre-update snapshot
Allocate Budget
Deallocate Budget
Workday / SAP
Check if budget consumed by downstream
Step-by-Step Integration Guide
1. Design the saga: identify steps, pivot, and compensations
Map every step in the business transaction. For each step, define the forward transaction, the compensating transaction, and whether it is compensable, pivot, or retryable. [src1, src6]
Example: Order-to-Cash Saga
Step 1 (Compensable): Create Sales Order in SAP
Forward: POST /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder
Compensate: PATCH with OverallSDProcessStatus = 'C'
Step 2 (Compensable): Reserve Inventory in WMS
Forward: POST /api/v1/reservations
Compensate: DELETE /api/v1/reservations/{id}
Step 3 (PIVOT): Capture Payment via Stripe
Forward: POST /v1/payment_intents/{id}/capture
No auto-compensation after this point
Step 4 (Retryable): Post Revenue in Oracle ERP
Step 5 (Retryable): Update Opportunity in Salesforce
Verify: Review with business stakeholders — they must agree on what “undo” means for each step.
2. Implement the orchestrator (Temporal TypeScript example)
Choose a workflow engine and implement the saga state machine. [src4]
cURL: Testing a Saga Step (SAP Sales Order Creation)
# Input: SAP OAuth token, order payload
# Output: Sales order ID or error
# Fetch CSRF token (required for SAP write operations)
curl -s -X GET "https://my-sap.s4hana.cloud/sap/opu/odata/sap/API_SALES_ORDER_SRV/" \
-H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: fetch" -D - -o /dev/null
# Create Sales Order
curl -X POST "https://my-sap.s4hana.cloud/.../A_SalesOrder" \
-H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: $CSRF_TOKEN" \
-H "Content-Type: application/json" \
-d '{"SalesOrderType":"OR","SoldToParty":"CUST001",
"to_Item":[{"Material":"MAT001","RequestedQuantity":"10"}]}'
# Expected: 201 Created with SalesOrder number
Data Mapping
Saga State Mapping Across Systems
Saga State
Salesforce (CRM)
SAP S/4HANA
Stripe
Orchestrator
Initiated
Opp: Negotiation
No record
PI: created
RUNNING
Order Created
Opp: Negotiation
SO: Open
PI: created
Step 1: COMPLETE
Inventory Reserved
Opp: Negotiation
SO + Reservation
PI: created
Step 2: COMPLETE
Payment Captured
Opp: Closed Won
SO: Open
PI: succeeded
PIVOT PASSED
Completed
Opp: Closed Won
SO: Completed
PI: succeeded
COMPLETED
Compensating
Opp: reverted
SO: Cancelled
PI: canceled
COMPENSATING
Data Type Gotchas
Currency precision: Stripe uses smallest unit (cents). SAP uses full unit with decimal. Normalize before passing between systems. [src3]
Date/time zones: Salesforce stores UTC. SAP depends on user profile. Oracle uses server timezone. Normalize to UTC at orchestrator level. [src2]
ID formats: SAP = 10-digit numeric. Salesforce = 18-char alphanumeric. Stripe = prefixed strings (pi_xxx). Maintain cross-reference map. [src3]
Status enums: “Cancelled” in Salesforce is not the same field as “Cancelled” in SAP. Map each system's status vocabulary.
Error Handling & Failure Points
Common Error Scenarios
Scenario
Impact
Detection
Resolution
Orchestrator crashes mid-saga
Saga hangs in partial state
Heartbeat timeout
Durable workflow engine auto-resumes
ERP API timeout (no response)
Unknown state
Timeout threshold
Read-before-write: query for idempotency key
Compensation fails
Inconsistent state
Compensation error handler
Dead letter queue + manual intervention
Rate limit exceeded (429)
Step cannot execute
HTTP 429 response
Exponential backoff, max 5 retries
Concurrent saga conflict
Lost update or dirty read
409/412 response
Semantic lock or reread-and-retry
Network partition
Participant may have executed
Timeout + no response
Idempotent retry
Failure Points in Production
Compensation timeout cascade: One slow compensation causes subsequent compensations to time out. Fix: Set independent timeouts for each compensation step. [src3]
Saga state persistence loss: In-memory orchestrators lose state on crash. Fix: Use durable state persistence (Temporal, Step Functions). [src4]
Idempotency key collision: Different sagas share an idempotency key. Fix: Include workflow execution ID + step name in key. [src2]
Compensation ordering violation: Out-of-order compensations cause data integrity issues. Fix: Always compensate in strict reverse order. [src6]
Stale auth tokens: Token fetched at saga start expires mid-saga. Fix: Fetch fresh auth tokens before each ERP API call. [src3]
Anti-Patterns
Wrong: Using 2PC across ERP APIs
// BAD — 2PC requires all participants to hold locks simultaneously
// ERP APIs do not support distributed lock coordination
BEGIN DISTRIBUTED TRANSACTION
Lock row in Salesforce (not possible via REST API)
Lock row in SAP (not possible via OData API)
COMMIT ALL
Correct: Use Saga with compensating transactions
// GOOD — each step commits independently; compensations handle failure
Step 1: Create order in SAP (committed)
Step 2: Reserve inventory (committed)
Step 3: Capture payment (PIVOT)
If Step 2 fails: compensate Step 1 (cancel order)
Wrong: Fire-and-forget without compensation design
// BAD — no compensation, no idempotency, no error handling
async function processOrder(order) {
await createSapOrder(order); // succeeds, but...
await reserveInventory(order); // succeeds, but...
await capturePayment(order); // FAILS — SAP order orphaned
}
Correct: Every forward step has a compensation partner
// GOOD — compensation stack ensures cleanup
async function processOrderSaga(order) {
const compensations = [];
try {
const so = await createSapOrder(order);
compensations.push(() => cancelSapOrder(so.id));
const res = await reserveInventory(order);
compensations.push(() => releaseInventory(res.id));
await capturePayment(order); // PIVOT
await postRevenue(so.id); // Retryable
} catch (error) {
for (const comp of compensations.reverse()) await comp();
throw error;
}
}
Wrong: Choreography with 6+ ERP participants
// BAD — invisible dependency web across 7 services
OrderService emits OrderCreated
InventoryService emits InventoryReserved
PaymentService emits PaymentCaptured
... 4 more services listening to each other
// Debug: "Why did this fail?" = trace 7 event streams
Correct: Orchestration for complex multi-ERP flows
// GOOD — single orchestrator, all state in one place
Orchestrator: createOrder(SAP) -> reserveInventory(WMS) ->
capturePayment(Stripe) -> postRevenue(Oracle) -> updateCRM(SF)
// Debug: check one orchestrator execution log
Common Pitfalls
Treating compensation as rollback: Compensations create new records (credit memos, refunds). Fix: Document compensation behavior; get business sign-off. [src1]
Missing idempotency: Network retries cause duplicates. Fix: Use idempotency keys from workflow ID + step name. [src2]
Ignoring ERP API rate limits: 10-step saga = 20-30 API calls at scale. Fix: Calculate worst-case API consumption; set concurrency limits. [src3]
Ignoring eventual consistency window: Other systems see intermediate state. Fix: Use semantic locks (status flags like "pending_confirmation"). [src6]
Hardcoding saga timeout: Different ERPs have wildly different response times. Fix: Set per-step timeouts based on each ERP's SLA. [src3]
Not testing compensation path: Teams only test the happy path. Fix: Chaos testing — inject failures at each step, verify full compensation. [src4]
Compensation is a business decision: What “undo” means for each step must be defined by stakeholders, not technologists alone.
Eventual consistency is visible to users: Between steps, the system is in intermediate state. Use semantic status flags to communicate saga progress.
Complexity compounds with participants: Each new participant adds forward + compensation steps. Split large processes into smaller independent sagas.
Platform lock-in risk: Choosing Step Functions, Durable Functions, or Temporal creates dependency for critical transactions. The pattern is portable; the implementation is not.
ERP API changes break sagas: When an ERP vendor changes API behavior, saga steps may fail. Pin API versions and monitor deprecation notices.
Architecture-level guidance: This card describes the pattern; implementation varies by platform, ERP, and language. Consult specific ERP API docs for production use.