Saga Pattern for Distributed Transactions Across ERPs

Type: ERP Integration System: Cross-ERP (Pattern-level) Confidence: 0.92 Sources: 6 Verified: 2026-03-02 Freshness: evolving

TL;DR

System Profile

The Saga pattern is an architecture pattern, not a product. It applies to any distributed transaction that spans multiple ERP systems, middleware platforms, or microservices. This card covers the pattern itself, its two coordination approaches (orchestration and choreography), and practical implementation guidance for cross-ERP business transactions.

SystemRoleAPI SurfaceDirection
ERP System A (e.g., Salesforce)CRM — source order/opportunityREST APIOutbound
ERP System B (e.g., SAP S/4HANA)ERP — financial posting, inventoryOData / BAPIInbound
Payment Gateway (e.g., Stripe)Payment processingREST APIBidirectional
Orchestrator (e.g., Temporal, Step Functions)Saga coordinatorSDK / State MachineOrchestrator

API Surfaces & Capabilities

The Saga pattern does not define a specific API surface — it defines how multiple API surfaces coordinate. The key capability matrix compares the two coordination approaches:

CapabilityOrchestrationChoreography
CoordinationCentral orchestrator sends commands to participantsParticipants publish/subscribe to domain events
Complexity ceilingScales to 10+ participantsPractical limit: 3-5 participants
VisibilitySingle place to inspect saga stateMust aggregate events across all participants
CouplingParticipants coupled to orchestratorParticipants coupled to event schema
Single point of failureYes — orchestrator (mitigated by HA)No — distributed
Cyclic dependenciesAvoided by designRisk of circular event chains
Error handlingOrchestrator triggers compensationsEach participant handles failure events independently
DebuggingStraightforward — follow orchestrator logsDifficult — requires distributed tracing
TestingTest orchestrator + mock participantsMust run all participants to test end-to-end
New participantModify orchestrator workflowAdd new event listener (verify no side effects)

Rate Limits & Quotas

Per-Saga Limits

Saga throughput is constrained by the slowest participant's API limits. A single business transaction may consume multiple API calls per participant:

Saga StepTypical API CallsRate Limit ConcernMitigation
Create order in CRM1-3Salesforce: 100K/24hBatch via composite API
Reserve inventory in ERP1-2SAP: depends on ICM configUse async BAPI for high volume
Process payment1-2Stripe: 100 req/sBuilt-in rate limiting
Post financial document1-3Oracle: throttled per tenantUse batch endpoints
Compensate on failure1 per completed stepDoubles API consumptionBudget 2x API calls for failures

Orchestrator Platform Limits

PlatformExecution LimitState SizeTimeoutPricing
AWS Step Functions (Standard)Unlimited duration256 KB payload1 year max$0.025/1K transitions
AWS Step Functions (Express)5 minutes max256 KB payload5 minutesPer execution + duration
Temporal CloudUnlimited duration50K events/workflowNo hard timeoutPer action
Azure Durable FunctionsUnlimited durationStorage-backedNo hard timeoutPer execution + compute
Camunda 8Unlimited duration1 MB per variableConfigurablePer execution (cloud)

Authentication

Each saga participant authenticates independently. The orchestrator must maintain valid credentials for every system:

ERP SystemAuth FlowToken LifetimeSaga Consideration
SalesforceOAuth 2.0 JWT Bearer2h (configurable)Token may expire mid-saga — refresh per step
SAP S/4HANAOAuth 2.0 / Basic AuthSession-basedCSRF token required per write operation
Oracle ERP CloudOAuth 2.0 / Basic AuthSession-basedRate limiting tied to auth identity
NetSuiteToken-Based Auth (TBA)No expiryConsumer/token key pair — rotate quarterly
StripeAPI Key (Secret Key)No expiryUse restricted keys with minimal permissions

Authentication Gotchas

Constraints

Integration Pattern Decision Tree

START — Need distributed transaction across multiple ERPs
├── How many systems participate?
│   ├── 2 systems, simple request-reply
│   │   └── SKIP SAGA — use direct API call + retry + DLQ
│   ├── 2-3 systems, linear flow
│   │   ├── Event infrastructure exists? → Consider CHOREOGRAPHY
│   │   └── No event infra? → Use ORCHESTRATION
│   └── 4+ systems or branching/parallel
│       └── Always ORCHESTRATION
├── Consistency requirement?
│   ├── Strong ACID → SAGA IS WRONG PATTERN
│   ├── Eventual consistency → Saga fits
│   └── Best-effort → Saga with relaxed compensation
├── Transaction duration?
│   ├── < 1s → Synchronous orchestration
│   ├── 1s - 5min → Step Functions Express or Temporal
│   └── > 5min → Step Functions Standard, Temporal, Camunda
└── Error tolerance?
    ├── Zero-loss → Idempotent steps + DLQ + manual review
    └── Best-effort → Retry with exponential backoff + alerting

Quick Reference

Saga Transaction Types

Transaction TypeDescriptionCan Be Undone?ERP Example
CompensableForward step that can be semantically reversedYesCreate sales order in SAP (→ cancel order)
PivotPoint of no return — saga must complete after thisAfter this: NoCapture payment (→ cannot un-charge, only refund)
RetryableSteps after pivot that must eventually succeedN/A (must succeed)Post revenue recognition, update CRM status

Orchestration vs Choreography Decision Matrix

CriterionChoose OrchestrationChoose Choreography
Participant count4+ systems2-3 systems
Workflow complexityBranching, parallel, conditionalLinear, sequential
Team structureCentral integration teamIndependent service teams
Debugging needsMust trace full transactionPer-service debugging sufficient
Failure recoveryCentralized compensation logicDistributed compensation logic
ERP integrationsAlmost always (ERP APIs are complex)Rarely (ERP complexity demands central control)

Compensating Transaction Reference for Common ERP Operations

Forward TransactionCompensating TransactionSystemGotcha
Create Sales OrderCancel Sales Order + release inventorySAP S/4HANACancellation creates a new document
Reserve InventoryRelease ReservationAny ERPCheck if reservation consumed by fulfillment
Authorize PaymentVoid AuthorizationStripe/GatewayVoid only works before capture
Post Accounting EntryPost Reversal EntryOracle ERP / SAPReversal is a new journal entry
Create InvoiceIssue Credit MemoAny ERPCredit memo must reference original invoice
Update Customer RecordRestore Previous ValuesSalesforceRequires storing pre-update snapshot
Allocate BudgetDeallocate BudgetWorkday / SAPCheck if budget consumed by downstream

Step-by-Step Integration Guide

1. Design the saga: identify steps, pivot, and compensations

Map every step in the business transaction. For each step, define the forward transaction, the compensating transaction, and whether it is compensable, pivot, or retryable. [src1, src6]

Example: Order-to-Cash Saga

Step 1 (Compensable): Create Sales Order in SAP
  Forward: POST /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder
  Compensate: PATCH with OverallSDProcessStatus = 'C'

Step 2 (Compensable): Reserve Inventory in WMS
  Forward: POST /api/v1/reservations
  Compensate: DELETE /api/v1/reservations/{id}

Step 3 (PIVOT): Capture Payment via Stripe
  Forward: POST /v1/payment_intents/{id}/capture
  No auto-compensation after this point

Step 4 (Retryable): Post Revenue in Oracle ERP
Step 5 (Retryable): Update Opportunity in Salesforce

Verify: Review with business stakeholders — they must agree on what “undo” means for each step.

2. Implement the orchestrator (Temporal TypeScript example)

Choose a workflow engine and implement the saga state machine. [src4]

// saga-orchestrator.ts — Temporal Workflow
import { proxyActivities, ApplicationFailure } from '@temporalio/workflow';

const { createSalesOrder, cancelSalesOrder, reserveInventory,
  releaseInventory, capturePayment, postRevenue, updateCrm
} = proxyActivities({ startToCloseTimeout: '30s',
  retry: { maximumAttempts: 3, backoffCoefficient: 2 } });

export async function orderToCashSaga(order) {
  const compensations = [];
  try {
    const so = await createSalesOrder(order);
    compensations.push(() => cancelSalesOrder(so.id));

    const res = await reserveInventory(order.items);
    compensations.push(() => releaseInventory(res.id));

    await capturePayment(order.paymentIntentId); // PIVOT

    await postRevenue(so.id, order.amount);      // Retryable
    await updateCrm(order.opportunityId);         // Retryable

    return { status: 'completed', salesOrderId: so.id };
  } catch (error) {
    for (const comp of compensations.reverse()) {
      await comp();
    }
    return { status: 'compensated', error: String(error) };
  }
}

Verify: Deploy workflow → trigger with test order → confirm all steps execute or compensate.

3. Implement idempotent participants

Each saga participant must be idempotent. Use idempotency keys derived from the workflow execution ID. [src2, src3]

export async function createSalesOrder(order) {
  const idempotencyKey = `saga-${workflowId}-create-so`;
  const existing = await sapClient.get(
    `/A_SalesOrder?$filter=YY1_IdempotencyKey eq '${idempotencyKey}'`);
  if (existing.d.results.length > 0) return existing.d.results[0];

  return (await sapClient.post('/A_SalesOrder', {
    SalesOrderType: 'OR', SoldToParty: order.customerId,
    YY1_IdempotencyKey: idempotencyKey,
    to_Item: order.items.map(i => ({
      Material: i.sku, RequestedQuantity: i.quantity }))
  })).d;
}

Verify: Call twice with same workflow ID → confirm only one sales order exists.

4. Implement compensation with pre-update snapshots

For update steps, store previous state for compensation. [src1]

export async function updateCrmStatus(oppId, newStatus) {
  const current = await sfClient.get(`/Opportunity/${oppId}?fields=StageName`);
  const previousStatus = current.StageName;
  await sfClient.patch(`/Opportunity/${oppId}`, { StageName: newStatus });
  return { previousStatus };
}

export async function revertCrmStatus(oppId, previousStatus) {
  await sfClient.patch(`/Opportunity/${oppId}`, { StageName: previousStatus });
}

Verify: Update opportunity → trigger compensation → confirm StageName reverted.

Code Examples

Python: Saga Orchestrator with Temporal

# Input:  Order details (customer_id, items, payment_intent_id)
# Output: Saga result (completed or compensated)

from temporalio import workflow, activity
from dataclasses import dataclass

@workflow.defn
class OrderToCashSaga:
    @workflow.run
    async def run(self, order):
        compensations = []
        try:
            so = await workflow.execute_activity(
                create_sales_order, order, start_to_close_timeout=timedelta(seconds=30))
            compensations.append(lambda: workflow.execute_activity(
                cancel_sales_order, so["id"], start_to_close_timeout=timedelta(seconds=30)))

            res = await workflow.execute_activity(
                reserve_inventory, order.items, start_to_close_timeout=timedelta(seconds=30))
            compensations.append(lambda: workflow.execute_activity(
                release_inventory, res["id"], start_to_close_timeout=timedelta(seconds=30)))

            await workflow.execute_activity(capture_payment, order.payment_intent_id,
                start_to_close_timeout=timedelta(seconds=30))  # PIVOT

            await workflow.execute_activity(post_revenue, so["id"],
                retry_policy=RetryPolicy(maximum_attempts=10))  # Retryable
            return {"status": "completed", "sales_order_id": so["id"]}
        except Exception as e:
            for comp in reversed(compensations):
                await comp()
            return {"status": "compensated", "error": str(e)}

JavaScript/Node.js: AWS Step Functions Definition

// Input:  Order event from API Gateway
// Output: Step Functions execution ARN + saga result

const sagaDefinition = {
  StartAt: "CreateSalesOrder",
  States: {
    CreateSalesOrder: {
      Type: "Task", Resource: "arn:aws:lambda:...:createSalesOrder",
      ResultPath: "$.salesOrder", Next: "ReserveInventory",
      Catch: [{ ErrorEquals: ["States.ALL"], Next: "SagaFailed" }]
    },
    ReserveInventory: {
      Type: "Task", Resource: "arn:aws:lambda:...:reserveInventory",
      Next: "CapturePayment",
      Catch: [{ ErrorEquals: ["States.ALL"], Next: "CancelSalesOrder" }]
    },
    CapturePayment: { // PIVOT
      Type: "Task", Resource: "arn:aws:lambda:...:capturePayment",
      Next: "PostRevenue",
      Catch: [{ ErrorEquals: ["States.ALL"], Next: "ReleaseInventory" }]
    },
    PostRevenue: { // Retryable
      Type: "Task", Resource: "arn:aws:lambda:...:postRevenue",
      Retry: [{ ErrorEquals: ["States.ALL"], MaxAttempts: 5, BackoffRate: 2 }],
      Next: "UpdateCRM"
    },
    UpdateCRM: { Type: "Task", Retry: [...], End: true },
    // Compensation chain (reverse order)
    ReleaseInventory: { Type: "Task", Next: "CancelSalesOrder" },
    CancelSalesOrder: { Type: "Task", Next: "SagaFailed" },
    SagaFailed: { Type: "Fail", Error: "SagaCompensated" }
  }
};

cURL: Testing a Saga Step (SAP Sales Order Creation)

# Input:  SAP OAuth token, order payload
# Output: Sales order ID or error

# Fetch CSRF token (required for SAP write operations)
curl -s -X GET "https://my-sap.s4hana.cloud/sap/opu/odata/sap/API_SALES_ORDER_SRV/" \
  -H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: fetch" -D - -o /dev/null

# Create Sales Order
curl -X POST "https://my-sap.s4hana.cloud/.../A_SalesOrder" \
  -H "Authorization: Bearer $SAP_TOKEN" -H "x-csrf-token: $CSRF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"SalesOrderType":"OR","SoldToParty":"CUST001",
       "to_Item":[{"Material":"MAT001","RequestedQuantity":"10"}]}'
# Expected: 201 Created with SalesOrder number

Data Mapping

Saga State Mapping Across Systems

Saga StateSalesforce (CRM)SAP S/4HANAStripeOrchestrator
InitiatedOpp: NegotiationNo recordPI: createdRUNNING
Order CreatedOpp: NegotiationSO: OpenPI: createdStep 1: COMPLETE
Inventory ReservedOpp: NegotiationSO + ReservationPI: createdStep 2: COMPLETE
Payment CapturedOpp: Closed WonSO: OpenPI: succeededPIVOT PASSED
CompletedOpp: Closed WonSO: CompletedPI: succeededCOMPLETED
CompensatingOpp: revertedSO: CancelledPI: canceledCOMPENSATING

Data Type Gotchas

Error Handling & Failure Points

Common Error Scenarios

ScenarioImpactDetectionResolution
Orchestrator crashes mid-sagaSaga hangs in partial stateHeartbeat timeoutDurable workflow engine auto-resumes
ERP API timeout (no response)Unknown stateTimeout thresholdRead-before-write: query for idempotency key
Compensation failsInconsistent stateCompensation error handlerDead letter queue + manual intervention
Rate limit exceeded (429)Step cannot executeHTTP 429 responseExponential backoff, max 5 retries
Concurrent saga conflictLost update or dirty read409/412 responseSemantic lock or reread-and-retry
Network partitionParticipant may have executedTimeout + no responseIdempotent retry

Failure Points in Production

Anti-Patterns

Wrong: Using 2PC across ERP APIs

// BAD — 2PC requires all participants to hold locks simultaneously
// ERP APIs do not support distributed lock coordination
BEGIN DISTRIBUTED TRANSACTION
  Lock row in Salesforce (not possible via REST API)
  Lock row in SAP (not possible via OData API)
COMMIT ALL

Correct: Use Saga with compensating transactions

// GOOD — each step commits independently; compensations handle failure
Step 1: Create order in SAP (committed)
Step 2: Reserve inventory (committed)
Step 3: Capture payment (PIVOT)
  If Step 2 fails: compensate Step 1 (cancel order)

Wrong: Fire-and-forget without compensation design

// BAD — no compensation, no idempotency, no error handling
async function processOrder(order) {
  await createSapOrder(order);    // succeeds, but...
  await reserveInventory(order);  // succeeds, but...
  await capturePayment(order);    // FAILS — SAP order orphaned
}

Correct: Every forward step has a compensation partner

// GOOD — compensation stack ensures cleanup
async function processOrderSaga(order) {
  const compensations = [];
  try {
    const so = await createSapOrder(order);
    compensations.push(() => cancelSapOrder(so.id));
    const res = await reserveInventory(order);
    compensations.push(() => releaseInventory(res.id));
    await capturePayment(order); // PIVOT
    await postRevenue(so.id);    // Retryable
  } catch (error) {
    for (const comp of compensations.reverse()) await comp();
    throw error;
  }
}

Wrong: Choreography with 6+ ERP participants

// BAD — invisible dependency web across 7 services
OrderService emits OrderCreated
InventoryService emits InventoryReserved
PaymentService emits PaymentCaptured
... 4 more services listening to each other
// Debug: "Why did this fail?" = trace 7 event streams

Correct: Orchestration for complex multi-ERP flows

// GOOD — single orchestrator, all state in one place
Orchestrator: createOrder(SAP) -> reserveInventory(WMS) ->
  capturePayment(Stripe) -> postRevenue(Oracle) -> updateCRM(SF)
// Debug: check one orchestrator execution log

Common Pitfalls

Diagnostic Commands

# Check Temporal workflow status
tctl workflow describe --workflow_id "order-saga-12345"

# List running sagas in Temporal
tctl workflow list --query "WorkflowType='OrderToCashSaga' AND ExecutionStatus='Running'"

# Check AWS Step Functions execution
aws stepfunctions describe-execution \
  --execution-arn "arn:aws:states:...:execution:OrderSaga:exec-001"

# List failed sagas in Step Functions (last 24h)
aws stepfunctions list-executions \
  --state-machine-arn "arn:aws:states:...:stateMachine:OrderSaga" \
  --status-filter "FAILED" --max-results 50

# Check Salesforce API usage
curl -s "https://yourorg.my.salesforce.com/services/data/v62.0/limits" \
  -H "Authorization: Bearer $SF_TOKEN" | jq '.DailyApiRequests'

# Monitor Stripe payment intent (verify saga pivot)
curl -s "https://api.stripe.com/v1/payment_intents/pi_xxx" \
  -u "$STRIPE_SECRET_KEY:" | jq '{status, amount, currency}'

Version History & Compatibility

Pattern / PlatformVersionDateStatusKey Changes
Saga pattern (original paper)Garcia-Molina & Salem1987FoundationalOriginal long-lived transaction concept
Microservices adaptationChris Richardson2018Current referenceOrchestration/choreography for microservices
AWS Step FunctionsStandard & Express2024GAMap state for parallel saga steps
Temporal1.x2024-2025GADurable execution, auto-retry, versioning
Azure Durable Functionsv42024GASub-orchestrations for nested sagas
Camunda 88.5+2024GAImproved BPMN compensation events

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Transaction spans 3+ ERP systemsTransaction fits within single ERP/databaseNative ERP transaction
Eventual consistency acceptableStrong ACID required across all systemsTwo-phase commit (rare for ERP APIs)
Steps take seconds to minutesSub-millisecond in-memory operationsLocal transaction + outbox pattern
Human approval steps involvedAll steps instantaneous and reversibleSimple retry with DLQ
Multiple teams own participantsSingle team owns all systemsDirect orchestration without saga

Cross-System Comparison

CapabilitySaga (Orchestration)Saga (Choreography)Two-Phase CommitOutbox + Eventual
ConsistencyEventualEventualStrong (ACID)Eventual
CouplingTo orchestratorTo event schemaTo coordinatorTo message broker
ScalabilityHighHighLow (locks)High
ERP API compatibleYesYesNoYes (single-system)
Multi-ERPPrimary use casePossible but hardNot viableNo
Failure recoveryAuto compensationDistributed compensationAuto rollbackRetry + idempotency
DebuggingGoodPoorGoodMedium
Latency overheadMediumLowHighLow
ToolingTemporal, Step FunctionsKafka, EventBridgeDatabase-nativeDebezium, Outbox libs

Important Caveats

Related Units