Saga Pattern for Distributed Transactions Across ERPs
Type: ERP Integration
System: Cross-ERP (Pattern-level)
Confidence: 0.92
Sources: 6
Verified: 2026-03-02
Freshness: evolving
TL;DR
- Bottom line: Use the Saga pattern when a business transaction spans multiple ERPs (e.g., order-to-cash across Salesforce + SAP + a payment gateway) and you cannot use a traditional two-phase commit. Orchestration is the default choice for ERP integrations; choreography only works for simple 2-3 system flows.
- Key limit: Sagas provide eventual consistency, not ACID isolation — concurrent sagas can produce dirty reads, lost updates, and nonrepeatable reads. Design compensating transactions for every forward step.
- Watch out for: Compensating transactions are NOT true rollbacks — reversing an invoice in SAP or a payment in Stripe creates a new offsetting record, not a deletion. Business stakeholders must approve compensation semantics.
- Best for: Long-running, multi-step business processes that cross ERP boundaries — order-to-cash, procure-to-pay, intercompany settlement, multi-system fulfillment.
- Authentication: Each ERP participant authenticates independently — the saga orchestrator needs valid credentials (OAuth 2.0, API keys, certificates) for every system in the chain.
System Profile
The Saga pattern is an architecture pattern, not a product. It applies to any distributed transaction that spans multiple ERP systems, middleware platforms, or microservices. This card covers the pattern itself, its two coordination approaches (orchestration and choreography), and practical implementation guidance for cross-ERP business transactions.
| System | Role | API Surface | Direction |
| ERP System A (e.g., Salesforce) | CRM — source order/opportunity | REST API | Outbound |
| ERP System B (e.g., SAP S/4HANA) | ERP — financial posting, inventory | OData / BAPI | Inbound |
| Payment Gateway (e.g., Stripe) | Payment processing | REST API | Bidirectional |
| Orchestrator (e.g., Temporal, Step Functions) | Saga coordinator | SDK / State Machine | Orchestrator |
API Surfaces & Capabilities
The Saga pattern does not define a specific API surface — it defines how multiple API surfaces coordinate. The key capability matrix compares the two coordination approaches:
| Capability | Orchestration | Choreography |
| Coordination | Central orchestrator sends commands to participants | Participants publish/subscribe to domain events |
| Complexity ceiling | Scales to 10+ participants | Practical limit: 3-5 participants |
| Visibility | Single place to inspect saga state | Must aggregate events across all participants |
| Coupling | Participants coupled to orchestrator | Participants coupled to event schema |
| Single point of failure | Yes — orchestrator (mitigated by HA) | No — distributed |
| Cyclic dependencies | Avoided by design | Risk of circular event chains |
| Error handling | Orchestrator triggers compensations | Each participant handles failure events independently |
| Debugging | Straightforward — follow orchestrator logs | Difficult — requires distributed tracing |
| Testing | Test orchestrator + mock participants | Must run all participants to test end-to-end |
| New participant | Modify orchestrator workflow | Add new event listener (verify no side effects) |
Rate Limits & Quotas
Per-Saga Limits
Saga throughput is constrained by the slowest participant's API limits. A single business transaction may consume multiple API calls per participant:
| Saga Step | Typical API Calls | Rate Limit Concern | Mitigation |
| Create order in CRM | 1-3 | Salesforce: 100K/24h | Batch via composite API |
| Reserve inventory in ERP | 1-2 | SAP: depends on ICM config | Use async BAPI for high volume |
| Process payment | 1-2 | Stripe: 100 req/s | Built-in rate limiting |
| Post financial document | 1-3 | Oracle: throttled per tenant | Use batch endpoints |
| Compensate on failure | 1 per completed step | Doubles API consumption | Budget 2x API calls for failures |
Orchestrator Platform Limits
| Platform | Execution Limit | State Size | Timeout | Pricing |
| AWS Step Functions (Standard) | Unlimited duration | 256 KB payload | 1 year max | $0.025/1K transitions |
| AWS Step Functions (Express) | 5 minutes max | 256 KB payload | 5 minutes | Per execution + duration |
| Temporal Cloud | Unlimited duration | 50K events/workflow | No hard timeout | Per action |
| Azure Durable Functions | Unlimited duration | Storage-backed | No hard timeout | Per execution + compute |
| Camunda 8 | Unlimited duration | 1 MB per variable | Configurable | Per execution (cloud) |
Authentication
Each saga participant authenticates independently. The orchestrator must maintain valid credentials for every system:
| ERP System | Auth Flow | Token Lifetime | Saga Consideration |
| Salesforce | OAuth 2.0 JWT Bearer | 2h (configurable) | Token may expire mid-saga — refresh per step |
| SAP S/4HANA | OAuth 2.0 / Basic Auth | Session-based | CSRF token required per write operation |
| Oracle ERP Cloud | OAuth 2.0 / Basic Auth | Session-based | Rate limiting tied to auth identity |
| NetSuite | Token-Based Auth (TBA) | No expiry | Consumer/token key pair — rotate quarterly |
| Stripe | API Key (Secret Key) | No expiry | Use restricted keys with minimal permissions |
Authentication Gotchas
- Token expiry mid-saga: Long-running sagas can outlive access tokens. Refresh tokens before each step, not at saga start. [src3]
- Credential scope: Each saga step may need different permission scopes. Use scoped tokens per step where possible. [src4]
- CSRF tokens: SAP and some Oracle endpoints require CSRF tokens that expire quickly. Fetch a new one at each write step.
Constraints
- No ACID isolation: Concurrent sagas can produce lost updates, dirty reads, and nonrepeatable reads. Implement application-level countermeasures (semantic locks, commutative updates).
- Compensating transactions are imperfect: A compensation is a semantic inverse, not a database rollback. Business logic must define what “undo” means for each step.
- Compensation must be idempotent: If the compensation message is delivered twice, running it again must not cause double-reversal.
- Pivot transaction is the point of no return: Once the pivot succeeds, all subsequent steps must be retryable to completion.
- ERP API rate limits compound: A 10-step saga consuming 3 API calls per step = 30 calls per transaction. Factor in compensation (failure path doubles consumption).
- Eventual consistency window: Between a forward transaction and its compensation, other systems see inconsistent data.
Integration Pattern Decision Tree
START — Need distributed transaction across multiple ERPs
├── How many systems participate?
│ ├── 2 systems, simple request-reply
│ │ └── SKIP SAGA — use direct API call + retry + DLQ
│ ├── 2-3 systems, linear flow
│ │ ├── Event infrastructure exists? → Consider CHOREOGRAPHY
│ │ └── No event infra? → Use ORCHESTRATION
│ └── 4+ systems or branching/parallel
│ └── Always ORCHESTRATION
├── Consistency requirement?
│ ├── Strong ACID → SAGA IS WRONG PATTERN
│ ├── Eventual consistency → Saga fits
│ └── Best-effort → Saga with relaxed compensation
├── Transaction duration?
│ ├── < 1s → Synchronous orchestration
│ ├── 1s - 5min → Step Functions Express or Temporal
│ └── > 5min → Step Functions Standard, Temporal, Camunda
└── Error tolerance?
├── Zero-loss → Idempotent steps + DLQ + manual review
└── Best-effort → Retry with exponential backoff + alerting
Quick Reference
Saga Transaction Types
| Transaction Type | Description | Can Be Undone? | ERP Example |
| Compensable | Forward step that can be semantically reversed | Yes | Create sales order in SAP (→ cancel order) |
| Pivot | Point of no return — saga must complete after this | After this: No | Capture payment (→ cannot un-charge, only refund) |
| Retryable | Steps after pivot that must eventually succeed | N/A (must succeed) | Post revenue recognition, update CRM status |
Orchestration vs Choreography Decision Matrix
| Criterion | Choose Orchestration | Choose Choreography |
| Participant count | 4+ systems | 2-3 systems |
| Workflow complexity | Branching, parallel, conditional | Linear, sequential |
| Team structure | Central integration team | Independent service teams |
| Debugging needs | Must trace full transaction | Per-service debugging sufficient |
| Failure recovery | Centralized compensation logic | Distributed compensation logic |
| ERP integrations | Almost always (ERP APIs are complex) | Rarely (ERP complexity demands central control) |
Compensating Transaction Reference for Common ERP Operations
| Forward Transaction | Compensating Transaction | System | Gotcha |
| Create Sales Order | Cancel Sales Order + release inventory | SAP S/4HANA | Cancellation creates a new document |
| Reserve Inventory | Release Reservation | Any ERP | Check if reservation consumed by fulfillment |
| Authorize Payment | Void Authorization | Stripe/Gateway | Void only works before capture |
| Post Accounting Entry | Post Reversal Entry | Oracle ERP / SAP | Reversal is a new journal entry |
| Create Invoice | Issue Credit Memo | Any ERP | Credit memo must reference original invoice |
| Update Customer Record | Restore Previous Values | Salesforce | Requires storing pre-update snapshot |
| Allocate Budget | Deallocate Budget | Workday / SAP | Check if budget consumed by downstream |
Step-by-Step Integration Guide
1. Design the saga: identify steps, pivot, and compensations
Map every step in the business transaction. For each step, define the forward transaction, the compensating transaction, and whether it is compensable, pivot, or retryable. [src1, src6]
Example: Order-to-Cash Saga
Step 1 (Compensable): Create Sales Order in SAP
Forward: POST /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder
Compensate: PATCH with OverallSDProcessStatus = 'C'
Step 2 (Compensable): Reserve Inventory in WMS
Forward: POST /api/v1/reservations
Compensate: DELETE /api/v1/reservations/{id}
Step 3 (PIVOT): Capture Payment via Stripe
Forward: POST /v1/payment_intents/{id}/capture
No auto-compensation after this point
Step 4 (Retryable): Post Revenue in Oracle ERP
Step 5 (Retryable): Update Opportunity in Salesforce
Verify: Review with business stakeholders — they must agree on what “undo” means for each step.
2. Implement the orchestrator (Temporal TypeScript example)
Choose a workflow engine and implement the saga state machine. [src4]
// saga-orchestrator.ts — Temporal Workflow
import { proxyActivities, ApplicationFailure } from '@temporalio/workflow';
const { createSalesOrder, cancelSalesOrder, reserveInventory,
releaseInventory, capturePayment, postRevenue, updateCrm
} = proxyActivities({ startToCloseTimeout: '30s',
retry: { maximumAttempts: 3, backoffCoefficient: 2 } });
export async function orderToCashSaga(order) {
const compensations = [];
try {
const so = await createSalesOrder(order);
compensations.push(() => cancelSalesOrder(so.id));
const res = await reserveInventory(order.items);
compensations.push(() => releaseInventory(res.id));
await capturePayment(order.paymentIntentId); // PIVOT
await postRevenue(so.id, order.amount); // Retryable
await updateCrm(order.opportunityId); // Retryable
return { status: 'completed', salesOrderId: so.id };
} catch (error) {
for (const comp of compensations.reverse()) {
await comp();
}
return { status: 'compensated', error: String(error) };
}
}
Verify: Deploy workflow → trigger with test order → confirm all steps execute or compensate.
3. Implement idempotent participants
Each saga participant must be idempotent. Use idempotency keys derived from the workflow execution ID. [src2, src3]
export async function createSalesOrder(order) {
const idempotencyKey = `saga-${workflowId}-create-so`;
const existing = await sapClient.get(
`/A_SalesOrder?$filter=YY1_IdempotencyKey eq '${idempotencyKey}'`);
if (existing.d.results.length > 0) return existing.d.results[0];
return (await sapClient.post('/A_SalesOrder', {
SalesOrderType: 'OR', SoldToParty: order.customerId,
YY1_IdempotencyKey: idempotencyKey,
to_Item: order.items.map(i => ({
Material: i.sku, RequestedQuantity: i.quantity }))
})).d;
}
Verify: Call twice with same workflow ID → confirm only one sales order exists.
4. Implement compensation with pre-update snapshots
For update steps, store previous state for compensation. [src1]
export async function updateCrmStatus(oppId, newStatus) {
const current = await sfClient.get(`/Opportunity/${oppId}?fields=StageName`);
const previousStatus = current.StageName;
await sfClient.patch(`/Opportunity/${oppId}`, { StageName: newStatus });
return { previousStatus };
}
export async function revertCrmStatus(oppId, previousStatus) {
await sfClient.patch(`/Opportunity/${oppId}`, { StageName: previousStatus });
}
Verify: Update opportunity → trigger compensation → confirm StageName reverted.
Code Examples
Python: Saga Orchestrator with Temporal
# Input: Order details (customer_id, items, payment_intent_id)
# Output: Saga result (completed or compensated)
from temporalio import workflow, activity
from dataclasses import dataclass
@workflow.defn
class OrderToCashSaga:
@workflow.run
async def run(self, order):
compensations = []
try:
so = await workflow.execute_activity(
create_sales_order, order, start_to_close_timeout=timedelta(seconds=30))
compensations.append(lambda: workflow.execute_activity(
cancel_sales_order, so["id"], start_to_close_timeout=timedelta(seconds=30)))
res = await workflow.execute_activity(
reserve_inventory, order.items, start_to_close_timeout=timedelta(seconds=30))
compensations.append(lambda: workflow.execute_activity(
release_inventory, res["id"], start_to_close_timeout=timedelta(seconds=30)))
await workflow.execute_activity(capture_payment, order.payment_intent_id,
start_to_close_timeout=timedelta(seconds=30)) # PIVOT
await workflow.execute_activity(post_revenue, so["id"],
retry_policy=RetryPolicy(maximum_attempts=10)) # Retryable
return {"status": "completed", "sales_order_id": so["id"]}
except Exception as e:
for comp in reversed(compensations):
await comp()
return {"status": "compensated", "error": str(e)}
JavaScript/Node.js: AWS Step Functions Definition
// Input: Order event from API Gateway
// Output: Step Functions execution ARN + saga result
const sagaDefinition = {
StartAt: "CreateSalesOrder",
States: {
CreateSalesOrder: {
Type: "Task", Resource: "arn:aws:lambda:...:createSalesOrder",
ResultPath: "$.salesOrder", Next: "ReserveInventory",
Catch: [{ ErrorEquals: ["States.ALL"], Next: "SagaFailed" }]
},
ReserveInventory: {
Type: "Task", Resource: "arn:aws:lambda:...:reserveInventory",
Next: "CapturePayment",
Catch: [{ ErrorEquals: ["States.ALL"], Next: "CancelSalesOrder" }]
},
CapturePayment: { // PIVOT
Type: "Task", Resource: "arn:aws:lambda:...:capturePayment",
Next: "PostRevenue",
Catch: [{ ErrorEquals: ["States.ALL"], Next: "ReleaseInventory" }]
},
PostRevenue: { // Retryable
Type: "Task", Resource: "arn:aws:lambda:...:postRevenue",
Retry: [{ ErrorEquals: ["States.ALL"], MaxAttempts: 5, BackoffRate: 2 }],
Next: "UpdateCRM"
},
UpdateCRM: { Type: "Task", Retry: [...], End: true },
// Compensation chain (reverse order)
ReleaseInventory: { Type: "Task", Next: "CancelSalesOrder" },
CancelSalesOrder: { Type: "Task", Next: "SagaFailed" },
SagaFailed: { Type: "Fail", Error: "SagaCompensated" }
}
};
cURL: Testing a Saga Step (SAP Sales Order Creation)
# Input: SAP OAuth token, order payload
# Output: Sales order ID or error
# Fetch CSRF token (required for SAP write operations)
curl -s -X GET "https://my-sap.s4hana.cloud/sap/opu/odata/sap/API_SALES_ORDER_SRV/" \
-H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: fetch" -D - -o /dev/null
# Create Sales Order
curl -X POST "https://my-sap.s4hana.cloud/.../A_SalesOrder" \
-H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: $CSRF_TOKEN" \
-H "Content-Type: application/json" \
-d '{"SalesOrderType":"OR","SoldToParty":"CUST001",
"to_Item":[{"Material":"MAT001","RequestedQuantity":"10"}]}'
# Expected: 201 Created with SalesOrder number
Data Mapping
Saga State Mapping Across Systems
| Saga State | Salesforce (CRM) | SAP S/4HANA | Stripe | Orchestrator |
| Initiated | Opp: Negotiation | No record | PI: created | RUNNING |
| Order Created | Opp: Negotiation | SO: Open | PI: created | Step 1: COMPLETE |
| Inventory Reserved | Opp: Negotiation | SO + Reservation | PI: created | Step 2: COMPLETE |
| Payment Captured | Opp: Closed Won | SO: Open | PI: succeeded | PIVOT PASSED |
| Completed | Opp: Closed Won | SO: Completed | PI: succeeded | COMPLETED |
| Compensating | Opp: reverted | SO: Cancelled | PI: canceled | COMPENSATING |
Data Type Gotchas
- Currency precision: Stripe uses smallest unit (cents). SAP uses full unit with decimal. Normalize before passing between systems. [src3]
- Date/time zones: Salesforce stores UTC. SAP depends on user profile. Oracle uses server timezone. Normalize to UTC at orchestrator level. [src2]
- ID formats: SAP = 10-digit numeric. Salesforce = 18-char alphanumeric. Stripe = prefixed strings (pi_xxx). Maintain cross-reference map. [src3]
- Status enums: “Cancelled” in Salesforce is not the same field as “Cancelled” in SAP. Map each system's status vocabulary.
Error Handling & Failure Points
Common Error Scenarios
| Scenario | Impact | Detection | Resolution |
| Orchestrator crashes mid-saga | Saga hangs in partial state | Heartbeat timeout | Durable workflow engine auto-resumes |
| ERP API timeout (no response) | Unknown state | Timeout threshold | Read-before-write: query for idempotency key |
| Compensation fails | Inconsistent state | Compensation error handler | Dead letter queue + manual intervention |
| Rate limit exceeded (429) | Step cannot execute | HTTP 429 response | Exponential backoff, max 5 retries |
| Concurrent saga conflict | Lost update or dirty read | 409/412 response | Semantic lock or reread-and-retry |
| Network partition | Participant may have executed | Timeout + no response | Idempotent retry |
Failure Points in Production
- Compensation timeout cascade: One slow compensation causes subsequent compensations to time out. Fix:
Set independent timeouts for each compensation step. [src3]
- Saga state persistence loss: In-memory orchestrators lose state on crash. Fix:
Use durable state persistence (Temporal, Step Functions). [src4]
- ERP maintenance windows: Scheduled downtime fails sagas mid-flight. Fix:
Circuit breaker + queue pending sagas, resume after window. [src3]
- Idempotency key collision: Different sagas share an idempotency key. Fix:
Include workflow execution ID + step name in key. [src2]
- Compensation ordering violation: Out-of-order compensations cause data integrity issues. Fix:
Always compensate in strict reverse order. [src6]
- Stale auth tokens: Token fetched at saga start expires mid-saga. Fix:
Fetch fresh auth tokens before each ERP API call. [src3]
Anti-Patterns
Wrong: Using 2PC across ERP APIs
// BAD — 2PC requires all participants to hold locks simultaneously
// ERP APIs do not support distributed lock coordination
BEGIN DISTRIBUTED TRANSACTION
Lock row in Salesforce (not possible via REST API)
Lock row in SAP (not possible via OData API)
COMMIT ALL
Correct: Use Saga with compensating transactions
// GOOD — each step commits independently; compensations handle failure
Step 1: Create order in SAP (committed)
Step 2: Reserve inventory (committed)
Step 3: Capture payment (PIVOT)
If Step 2 fails: compensate Step 1 (cancel order)
Wrong: Fire-and-forget without compensation design
// BAD — no compensation, no idempotency, no error handling
async function processOrder(order) {
await createSapOrder(order); // succeeds, but...
await reserveInventory(order); // succeeds, but...
await capturePayment(order); // FAILS — SAP order orphaned
}
Correct: Every forward step has a compensation partner
// GOOD — compensation stack ensures cleanup
async function processOrderSaga(order) {
const compensations = [];
try {
const so = await createSapOrder(order);
compensations.push(() => cancelSapOrder(so.id));
const res = await reserveInventory(order);
compensations.push(() => releaseInventory(res.id));
await capturePayment(order); // PIVOT
await postRevenue(so.id); // Retryable
} catch (error) {
for (const comp of compensations.reverse()) await comp();
throw error;
}
}
Wrong: Choreography with 6+ ERP participants
// BAD — invisible dependency web across 7 services
OrderService emits OrderCreated
InventoryService emits InventoryReserved
PaymentService emits PaymentCaptured
... 4 more services listening to each other
// Debug: "Why did this fail?" = trace 7 event streams
Correct: Orchestration for complex multi-ERP flows
// GOOD — single orchestrator, all state in one place
Orchestrator: createOrder(SAP) -> reserveInventory(WMS) ->
capturePayment(Stripe) -> postRevenue(Oracle) -> updateCRM(SF)
// Debug: check one orchestrator execution log
Common Pitfalls
- Treating compensation as rollback: Compensations create new records (credit memos, refunds). Fix:
Document compensation behavior; get business sign-off. [src1]
- Missing idempotency: Network retries cause duplicates. Fix:
Use idempotency keys from workflow ID + step name. [src2]
- Ignoring ERP API rate limits: 10-step saga = 20-30 API calls at scale. Fix:
Calculate worst-case API consumption; set concurrency limits. [src3]
- Ignoring eventual consistency window: Other systems see intermediate state. Fix:
Use semantic locks (status flags like "pending_confirmation"). [src6]
- Hardcoding saga timeout: Different ERPs have wildly different response times. Fix:
Set per-step timeouts based on each ERP's SLA. [src3]
- Not testing compensation path: Teams only test the happy path. Fix:
Chaos testing — inject failures at each step, verify full compensation. [src4]
Diagnostic Commands
# Check Temporal workflow status
tctl workflow describe --workflow_id "order-saga-12345"
# List running sagas in Temporal
tctl workflow list --query "WorkflowType='OrderToCashSaga' AND ExecutionStatus='Running'"
# Check AWS Step Functions execution
aws stepfunctions describe-execution \
--execution-arn "arn:aws:states:...:execution:OrderSaga:exec-001"
# List failed sagas in Step Functions (last 24h)
aws stepfunctions list-executions \
--state-machine-arn "arn:aws:states:...:stateMachine:OrderSaga" \
--status-filter "FAILED" --max-results 50
# Check Salesforce API usage
curl -s "https://yourorg.my.salesforce.com/services/data/v62.0/limits" \
-H "Authorization: Bearer $SF_TOKEN" | jq '.DailyApiRequests'
# Monitor Stripe payment intent (verify saga pivot)
curl -s "https://api.stripe.com/v1/payment_intents/pi_xxx" \
-u "$STRIPE_SECRET_KEY:" | jq '{status, amount, currency}'
Version History & Compatibility
| Pattern / Platform | Version | Date | Status | Key Changes |
| Saga pattern (original paper) | Garcia-Molina & Salem | 1987 | Foundational | Original long-lived transaction concept |
| Microservices adaptation | Chris Richardson | 2018 | Current reference | Orchestration/choreography for microservices |
| AWS Step Functions | Standard & Express | 2024 | GA | Map state for parallel saga steps |
| Temporal | 1.x | 2024-2025 | GA | Durable execution, auto-retry, versioning |
| Azure Durable Functions | v4 | 2024 | GA | Sub-orchestrations for nested sagas |
| Camunda 8 | 8.5+ | 2024 | GA | Improved BPMN compensation events |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
| Transaction spans 3+ ERP systems | Transaction fits within single ERP/database | Native ERP transaction |
| Eventual consistency acceptable | Strong ACID required across all systems | Two-phase commit (rare for ERP APIs) |
| Steps take seconds to minutes | Sub-millisecond in-memory operations | Local transaction + outbox pattern |
| Human approval steps involved | All steps instantaneous and reversible | Simple retry with DLQ |
| Multiple teams own participants | Single team owns all systems | Direct orchestration without saga |
Cross-System Comparison
| Capability | Saga (Orchestration) | Saga (Choreography) | Two-Phase Commit | Outbox + Eventual |
| Consistency | Eventual | Eventual | Strong (ACID) | Eventual |
| Coupling | To orchestrator | To event schema | To coordinator | To message broker |
| Scalability | High | High | Low (locks) | High |
| ERP API compatible | Yes | Yes | No | Yes (single-system) |
| Multi-ERP | Primary use case | Possible but hard | Not viable | No |
| Failure recovery | Auto compensation | Distributed compensation | Auto rollback | Retry + idempotency |
| Debugging | Good | Poor | Good | Medium |
| Latency overhead | Medium | Low | High | Low |
| Tooling | Temporal, Step Functions | Kafka, EventBridge | Database-native | Debezium, Outbox libs |
Important Caveats
- Compensation is a business decision: What “undo” means for each step must be defined by stakeholders, not technologists alone.
- Eventual consistency is visible to users: Between steps, the system is in intermediate state. Use semantic status flags to communicate saga progress.
- Complexity compounds with participants: Each new participant adds forward + compensation steps. Split large processes into smaller independent sagas.
- Platform lock-in risk: Choosing Step Functions, Durable Functions, or Temporal creates dependency for critical transactions. The pattern is portable; the implementation is not.
- ERP API changes break sagas: When an ERP vendor changes API behavior, saga steps may fail. Pin API versions and monitor deprecation notices.
- Architecture-level guidance: This card describes the pattern; implementation varies by platform, ERP, and language. Consult specific ERP API docs for production use.
Related Units