Circuit Breaker Pattern for ERP API Integrations

Type: ERP Integration System: Cross-ERP (Pattern-level) Confidence: 0.88 Sources: 8 Verified: 2026-03-07 Freshness: evolving

TL;DR

System Profile

This card covers the circuit breaker pattern as applied to ERP API integrations across all major ERP systems. It is platform-agnostic but provides concrete implementations for the four major integration languages (Python, Java, C#, Node.js) and two leading iPaaS platforms (MuleSoft, Boomi). The pattern applies identically whether calling SAP S/4HANA OData, Salesforce REST, Oracle ERP Cloud REST, NetSuite SuiteTalk, or Dynamics 365 Web API.

PropertyValue
PatternCircuit Breaker (client-side resilience)
Applies ToAll ERP REST/SOAP/OData API calls
GranularityOne breaker per ERP endpoint or API surface
StatesClosed (normal) → Open (failing fast) → Half-Open (probing recovery)
ImplementationsPolly 8.x (.NET), Resilience4j 2.x (Java), Opossum 8.x (Node.js), Custom (Python)
iPaaSMuleSoft (gateway policy), Boomi (custom scripting), Workato (custom connector)
ReferenceAzure Architecture Center

API Surfaces & Capabilities

Circuit breakers protect calls to ERP API surfaces. Different API surfaces exhibit different failure modes and require different breaker configurations. [src1]

ERP API SurfaceTypical Failure ModeRecommended Breaker ConfigRecovery Time
SAP S/4HANA OData503 during planned downtime, timeouts on complex queries5 failures / 30s window, 120s break5-30 min (planned), 1-4h (incident)
Salesforce REST API429 rate limit, REQUEST_LIMIT_EXCEEDED, 5033 consecutive 429s, 60s break60s (rate limit), 5-15 min (incident)
Oracle ERP Cloud REST500/503 during patching, FBDI timeouts5 failures / 60s window, 180s break15-60 min (patching), 1-2h (incident)
NetSuite SuiteTalk/RESTSSS_REQUEST_LIMIT_EXCEEDED, concurrency cap3 failures / 20s, 30s break30-60s (concurrency), 10-30 min (incident)
Dynamics 365 OData429 with Retry-After header, 503 during updatesHonor Retry-After header, 5 failures / 30sPer Retry-After value, 5-30 min (updates)
Workday REST/SOAP503 during tenant maintenance, auth token expiry5 failures / 60s, 120s break30-120 min (maintenance)

Rate Limits & Quotas

Circuit Breaker Configuration Parameters

ParameterDescriptionRecommended DefaultNotes
Failure thresholdPercentage or count of failures that trips the breaker50% failure rate OR 5 consecutive failuresPercentage-based is more robust than count-based
Sampling windowTime period over which failures are counted10-30 secondsToo short = noise triggers opens; too long = slow detection
Minimum throughputMinimum requests in window before threshold is evaluated8-10 requestsPrevents tripping on 1 failure out of 2 requests
Break durationHow long the circuit stays open before half-open probe30-120 secondsMatch to ERP typical recovery time
Half-open probe countNumber of test requests allowed in half-open state1-3 requestsToo many probes can re-overload a recovering service
Success thresholdConsecutive successes in half-open to close circuit3-5 successesEnsures recovery is stable
TimeoutPer-request timeout that counts as failure30-60 seconds for ERP APIsERP APIs are slower than typical microservices

Per-ERP Error Codes That Should Trip the Breaker

ERP SystemTrip On (Open Circuit)Do NOT Trip On (Retry Instead)Notes
Salesforce503, REQUEST_LIMIT_EXCEEDED, SERVER_UNAVAILABLE400, INVALID_FIELD, DUPLICATE_VALUE429 — trip after 3 consecutive, not on first
SAP S/4HANA503, 504, CX_SY_RESOURCE_EXHAUSTION400, /IWBEP/CM_MGW_RT (OData validation)504 indicates SAP app server overload
Oracle ERP Cloud503, 500 (repeated), FBDI import timeout400, ORA-00001, validation errorsDistinguish transient 500 from persistent logic errors
NetSuiteSSS_REQUEST_LIMIT_EXCEEDED, SSS_CONCURRENT_LIMIT, 503USER_ERROR, INVALID_FLD_VALUEGovernance errors are transient; validation permanent
Dynamics 365429 (with Retry-After), 503, 502400, 403, 404, -2147204784Always honor Retry-After header

Authentication

Authentication failures interact with circuit breakers in specific ways. Token expiry should NOT trip the circuit. [src1]

ScenarioShould Trip Breaker?Correct Handling
OAuth token expired (401)NoRefresh token, retry once, then trip if refresh fails
API key invalid (403)NoAlert immediately — config error, not transient
Auth server unreachableYesTrip breaker on auth endpoint separately
MFA challenge requiredNoAlert — cannot be automated; wrong auth flow
Rate limit on auth endpointYesTrip breaker; queue data requests until auth recovers

Constraints

Integration Pattern Decision Tree

START — Should I use a circuit breaker for this ERP integration?
|
+-- Is the integration synchronous (real-time API call)?
|   +-- YES --> Circuit breaker is strongly recommended
|   |   +-- Is the ERP API call idempotent?
|   |   |   +-- YES --> Standard circuit breaker + retry
|   |   |   +-- NO --> Circuit breaker + idempotency key + DLQ
|   |   +-- Are you calling multiple ERP endpoints?
|   |       +-- YES --> Separate breaker per endpoint
|   |       +-- NO --> Single breaker sufficient
|   +-- NO (async / message-based)
|
+-- Is there a synchronous ERP API call within the async flow?
|   +-- YES --> Circuit breaker on the API call, not the queue consumer
|   +-- NO --> Circuit breaker adds no value; use DLQ + retry instead
|
+-- Which resilience pattern do I need?
    +-- Transient errors (network blip, 1-2 failures) --> Retry with backoff
    +-- Sustained outage (ERP down for minutes) --> Circuit breaker
    +-- Protecting shared resources (thread pools) --> Bulkhead
    +-- Server-side request throttling --> Rate limiter
    +-- Combining all four --> Retry -> Circuit Breaker -> Bulkhead -> Timeout

Quick Reference

PatternPurposeWhen to UseLibrary
Circuit BreakerStop calling failing service, fail fast, auto-recoverERP down or degraded for >30sPolly, Resilience4j, Opossum
RetryRetry transient failures with backoffNetwork blip, single 500, rate limitPolly, Resilience4j, tenacity
BulkheadIsolate resource pools per dependencyMultiple ERP integrations sharing thread poolPolly, Resilience4j
Rate LimiterClient-side request throttlingStay under ERP API quotaPolly, Resilience4j, bottleneck
TimeoutFail slow calls before thread exhaustionERP queries that can hang >60sBuilt into all HTTP clients
FallbackReturn cached/default value when circuit openNon-critical reads (product catalog)Application logic
Dead Letter QueueCapture failed writes for later replayWrite operations that must not be lostKafka DLQ, SQS DLQ, Azure Service Bus

Circuit Breaker States

StateBehaviorTransitions ToTrigger
ClosedAll requests pass through; failures countedOpenFailure rate exceeds threshold in sampling window
OpenAll requests fail immediately (BrokenCircuitException)Half-OpenBreak duration timer expires
Half-OpenLimited probe requests allowedClosed OR OpenProbe succeeds (→ Closed) or fails (→ Open)
IsolatedManually held open via API (Polly only)ClosedManual reset via ManualControl.CloseAsync()

Step-by-Step Integration Guide

1. Determine the correct granularity for your circuit breakers

Map your ERP API calls to circuit breaker instances. Each distinct endpoint or API surface gets its own breaker. [src1, src3]

ERP Integration Architecture:
  +-- Salesforce
  |   +-- CB: sf-rest      (REST API)
  |   +-- CB: sf-bulk       (Bulk API 2.0)
  |   +-- CB: sf-streaming  (Platform Events)
  +-- SAP S/4HANA
  |   +-- CB: sap-odata     (OData v4)
  |   +-- CB: sap-soap      (SOAP — legacy BAPIs)
  +-- Oracle ERP Cloud
      +-- CB: oracle-rest   (REST API)
      +-- CB: oracle-fbdi   (FBDI — file imports)

Verify: Each breaker should have independent state. Tripping sf-bulk should NOT affect sf-rest.

2. Configure thresholds based on ERP behavior

Start with conservative defaults, tune based on production telemetry. [src1, src2]

ParameterConservative StartTuned After 30 Days
Failure rate threshold50%Adjust based on baseline error rate
Sampling window30 secondsMatch to ERP API response time P99
Minimum throughput10Match to actual request volume
Break duration60 secondsMatch to ERP typical recovery time
Half-open probes3Increase if recovery is gradual

Verify: Monitor circuit state transitions for 7 days. If breaker trips >5x/day on a healthy ERP, thresholds are too sensitive.

3. Implement the circuit breaker in your language

See Code Examples section below for complete implementations in Python, Java, C#, and Node.js. [src2, src4, src5, src8]

4. Wire the fallback strategy for open circuit

When the circuit is open, the application must do something useful instead of throwing an exception. [src1]

Circuit Open — What to do with the request:
  +-- Is the operation a READ?
  |   +-- Return cached data (if fresh enough)
  |   +-- Return degraded response ("ERP data temporarily unavailable")
  |   +-- Route to secondary ERP instance (if available)
  +-- Is the operation a WRITE?
  |   +-- Queue to dead letter queue for later replay
  |   +-- Write to local staging table + reconcile later
  |   +-- Return 503 + Retry-After header to upstream caller
  +-- ALWAYS:
      +-- Log circuit state change
      +-- Fire alert if circuit open >5 minutes
      +-- Increment circuit-open counter in metrics

Code Examples

Python: Circuit Breaker for ERP API Calls

# Input:  ERP API endpoint URL, authentication headers
# Output: API response or fallback value when circuit is open
# Requires: requests>=2.31.0

import time, threading, requests
from enum import Enum
from dataclasses import dataclass, field

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 60.0
    half_open_max_calls: int = 3
    success_threshold: int = 3
    erp_timeout: float = 30.0
    TRIP_STATUS_CODES = {429, 500, 502, 503, 504}

    # ... (see .md for full implementation)

# Usage: one breaker per ERP endpoint
sf_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
sap_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=120)

response = sf_breaker.call(
    "GET", "https://myorg.my.salesforce.com/services/data/v62.0/query",
    params={"q": "SELECT Id, Name FROM Account LIMIT 10"},
    headers={"Authorization": "Bearer <token>"},
    fallback=lambda: {"records": [], "note": "Salesforce unavailable"}
)

Java: Resilience4j Circuit Breaker for ERP APIs

// Input:  ERP API endpoint, authentication config
// Output: API response or fallback when circuit is open
// Requires: io.github.resilience4j:resilience4j-circuitbreaker:2.2.0

CircuitBreakerConfig sapConfig = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)
    .slidingWindowType(SlidingWindowType.TIME_BASED)
    .slidingWindowSize(30)
    .minimumNumberOfCalls(8)
    .waitDurationInOpenState(Duration.ofSeconds(120))
    .permittedNumberOfCallsInHalfOpenState(3)
    .recordExceptions(ConnectException.class, HttpTimeoutException.class)
    .build();

CircuitBreaker sapBreaker = CircuitBreakerRegistry.of(sapConfig)
    .circuitBreaker("sap-odata", sapConfig);

C#: Polly 8.x Circuit Breaker for ERP APIs

// Requires: Microsoft.Extensions.Http.Resilience 8.x, Polly 8.x

services.AddHttpClient("SapOData", client =>
{
    client.BaseAddress = new Uri("https://my-sap.s4hana.cloud/sap/opu/odata4/");
    client.Timeout = TimeSpan.FromSeconds(60);
})
.AddResilienceHandler("sap-pipeline", builder =>
{
    builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage>
    {
        FailureRatio = 0.5,
        SamplingDuration = TimeSpan.FromSeconds(30),
        MinimumThroughput = 8,
        BreakDuration = TimeSpan.FromSeconds(120),
        ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
            .HandleResult(r => r.StatusCode == HttpStatusCode.TooManyRequests
                            || r.StatusCode >= HttpStatusCode.InternalServerError)
            .Handle<HttpRequestException>()
    });
});

Node.js: Opossum Circuit Breaker for ERP APIs

// Requires: [email protected], [email protected]

import CircuitBreaker from "opossum";

async function callSalesforceApi(endpoint, options) {
  const response = await fetch(
    `https://myorg.my.salesforce.com/services/data/v62.0/${endpoint}`,
    { ...options, signal: AbortSignal.timeout(30000) }
  );
  if (response.status === 429 || response.status >= 500) {
    throw new Error(`Salesforce API error: ${response.status}`);
  }
  return response.json();
}

const sfBreaker = new CircuitBreaker(callSalesforceApi, {
  timeout: 30000,
  errorThresholdPercentage: 50,
  resetTimeout: 60000,
  volumeThreshold: 5,
});

sfBreaker.fallback(() => ({ records: [], _circuitOpen: true }));
sfBreaker.on("open", () => console.warn("[CIRCUIT OPEN] Salesforce"));
sfBreaker.on("close", () => console.info("[CIRCUIT CLOSED] Salesforce"));

Data Mapping

Configuration Mapping Across Libraries

ParameterPolly 8.x (C#)Resilience4j 2.x (Java)Opossum 8.x (Node.js)Python (Custom)
Failure rateFailureRatio (0.0-1.0)failureRateThreshold (0-100%)errorThresholdPercentage (0-100)failure_threshold (count)
Sampling windowSamplingDuration (TimeSpan)slidingWindowSize (int seconds)rollingCountTimeout (ms)Custom (time.time())
Minimum throughputMinimumThroughput (int)minimumNumberOfCalls (int)volumeThreshold (int)failure_threshold (implicit)
Break durationBreakDuration (TimeSpan)waitDurationInOpenState (Duration)resetTimeout (ms)recovery_timeout (float s)
Manual controlManualControl.IsolateAsync()transitionToForcedOpenState()breaker.open()Direct state mutation

Data Type Gotchas

Error Handling & Failure Points

Common Error Codes

CodeMeaningShould Trip Breaker?Resolution
429Rate limit exceededYes (after 3 consecutive)Backoff, respect Retry-After, align break to rate limit window
500Internal Server ErrorYes (if repeated)Trip after 3-5 in window; single 500 could be transient
502Bad GatewayYesProxy/LB failure; ERP app server likely down
503Service UnavailableYes (immediately)ERP explicitly saying stop; trip immediately
504Gateway TimeoutYesERP app server overloaded
400Bad RequestNoFix the request payload — code bug, not outage
401UnauthorizedNoRefresh auth token; trip only if refresh also fails
403ForbiddenNoPermission issue; alert, don't trip
404Not FoundNoWrong endpoint; fix code

Failure Points in Production

Anti-Patterns

Wrong: Circuit breaker on writes without DLQ

# BAD — writes are silently dropped when circuit opens
try:
    breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
    logger.warning("Circuit open — invoice not created")
    # Invoice is LOST. No retry. No queue. Gone forever.

Correct: Circuit breaker with dead letter queue for writes

# GOOD — failed writes are queued for later replay
try:
    breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
    dlq.send({
        "endpoint": f"{erp_url}/invoices",
        "payload": invoice_data,
        "idempotency_key": invoice_data["externalId"],
    })

Wrong: Single circuit breaker for all ERP endpoints

// BAD — Bulk API timeout trips the breaker for REST API too
const erpBreaker = new CircuitBreaker(callAnyErpApi, { timeout: 30000 });
await erpBreaker.fire("bulk/import", bulkPayload);  // hangs 60s, trips
await erpBreaker.fire("query/accounts", {});         // BLOCKED!

Correct: Separate circuit breaker per API surface

// GOOD — each API surface has its own breaker
const sfRestBreaker = new CircuitBreaker(callSfRest, { timeout: 15000 });
const sfBulkBreaker = new CircuitBreaker(callSfBulk, { timeout: 300000 });
// Bulk timeout does NOT affect REST operations

Wrong: Too-sensitive threshold on high-latency ERP APIs

# BAD — ERP APIs are NOT microservices
resilience4j.circuitbreaker.instances.sap-odata:
  failureRateThreshold: 20    # too sensitive
  slidingWindowSize: 5         # too short
  minimumNumberOfCalls: 2      # too few
  waitDurationInOpenState: 5s  # too short

Correct: Thresholds calibrated for ERP latency profiles

# GOOD — tuned for real ERP behavior
resilience4j.circuitbreaker.instances.sap-odata:
  failureRateThreshold: 50
  slidingWindowSize: 30
  minimumNumberOfCalls: 8
  waitDurationInOpenState: 120s
  permittedNumberOfCallsInHalfOpenState: 3

Common Pitfalls

Diagnostic Commands

# Check Resilience4j circuit breaker state (Spring Boot Actuator)
curl -s http://localhost:8080/actuator/circuitbreakers | jq '.circuitBreakers'
# Expected: {"sap-odata":{"state":"CLOSED","failureRate":-1.0}}

# Check Resilience4j circuit breaker events
curl -s http://localhost:8080/actuator/circuitbreakerevents | jq '.circuitBreakerEvents[-5:]'

# Check Polly circuit state via custom health endpoint
curl -s http://localhost:5000/health/circuits | jq
# Expected: {"sap-odata":"Closed","sf-rest":"Closed","oracle-rest":"HalfOpen"}

# Test ERP API health directly (bypassing circuit breaker)
# Salesforce
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $SF_TOKEN" \
  "https://myorg.my.salesforce.com/services/data/v62.0/limits"

# SAP S/4HANA
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $SAP_TOKEN" \
  "https://my-sap.s4hana.cloud/sap/opu/odata4/sap/api_business_partner/\$metadata"

# Check if ERP is in maintenance (manual probe)
curl -s -w "\n%{http_code} %{time_total}s" \
  -H "Authorization: Bearer $TOKEN" \
  "$ERP_API_URL/health" 2>&1
# 503 = maintenance; 200 with >10s = degraded; 200 with <2s = healthy

Version History & Compatibility

LibraryVersionReleaseBreaking ChangesNotes
Polly8.x2023-07Complete API rewrite — Policy replaced by ResiliencePipelineCannot mix v7 and v8
Polly7.x2019-06Legacy — maintenance onlyStill widely used; plan migration
Resilience4j2.2.02024-03Minor — TIME_BASED sliding window improvementsRecommended for new Java projects
Resilience4j1.x2020-01EOLMigrate to 2.x
Opossum8.1.32025-01Minor — improved TypeScript typesStable; primary Node.js option

iPaaS Circuit Breaker Support

PlatformBuilt-in?Config LevelNotes
MuleSoftYes — gateway policyAPI Gateway (Envoy)maxConnections, maxPendingRequests, maxRequests
BoomiNo — custom scriptingProcess levelImplement in Data Process shape with Groovy
WorkatoNo — custom connectorConnector SDKBuild in custom connector actions
SAP Integration SuitePartial — retry + timeoutiFlow levelNo native breaker; use Groovy + JCache

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Synchronous ERP API calls that can cascade failuresAsync message-based integration (Kafka, SQS)Dead letter queue + retry policy
ERP has known maintenance windows causing extended downtimeSingle transient error that resolves on retrySimple retry with exponential backoff
Multiple downstream ERP endpoints with independent failure modesAll calls go through a single gateway handling resilienceGateway-level circuit breaker (MuleSoft policy)
Long-lived service processing continuous requestsOne-time batch job that runs once and exitsRetry + error log
Saga pattern with multiple ERP steps needing protectionSimple two-system point-to-point integrationRetry + DLQ (circuit breaker is overkill)

Cross-System Comparison

Library Comparison for Custom Implementations

CapabilityPolly 8.xResilience4j 2.xOpossum 8.xpybreaker 1.x
LanguageC# / .NETJava / KotlinNode.js / TypeScriptPython
Circuit states4 (incl. Isolated)333
Sliding windowTime-basedTime or CountTime-based (rolling)Count-based
Dynamic breakBreakDurationGeneratorCustom (extend)Not built-inCustom (extend)
HttpClient integrationNative (IHttpClientFactory)Spring WebClientManual wrapManual wrap
MonitoringBuilt-in telemetryActuator + MicrometerEvents + PrometheusCustom events
BulkheadYes (same pipeline)Yes (separate)NoNo
Rate limiterYes (same pipeline)Yes (separate)NoNo
MaturityVery highVery highHighModerate

Important Caveats

Related Units