How to prevent cascading failures in ERP integrations

- Bottom line: Wrap every outbound ERP API call in a circuit breaker that trips after a configurable failure threshold, fails fast while the ERP recovers, and probes with limited half-open requests before restoring full traffic.

ERP API circuit breaker configuration — thresholds, states, fallbacks

- Bottom line: Wrap every outbound ERP API call in a circuit breaker that trips after a configurable failure threshold, fails fast while the ERP recovers, and probes with limited half-open requests before restoring full traffic.

When should I use circuit breaker vs retry vs bulkhead for ERP APIs?

- Bottom line: Wrap every outbound ERP API call in a circuit breaker that trips after a configurable failure threshold, fails fast while the ERP recovers, and probes with limited half-open requests before restoring full traffic.

Circuit Breaker Pattern for ERP API Integrations

How do you implement the circuit breaker pattern for ERP API integrations?

TL;DR

Bottom line: Wrap every outbound ERP API call in a circuit breaker that trips after a configurable failure threshold, fails fast while the ERP recovers, and probes with limited half-open requests before restoring full traffic.
Key limit: Circuit breaker is per-process state by default — horizontal scaling requires shared state (Redis) or per-instance breakers with coordinated thresholds.
Watch out for: Setting thresholds too sensitive (trips on 2 failures) causes false opens during normal ERP latency spikes; too tolerant (50 failures) defeats the purpose. Start at 50% failure rate over 10-second windows with minimum 8 requests sampled.
Best for: Real-time ERP API integrations where downstream unavailability (SAP maintenance, Salesforce governor limits, Oracle Cloud outages) would cascade into thread exhaustion, connection pool starvation, or saga timeout failures.
Libraries: Python (custom or pybreaker), Java (Resilience4j 2.x), C# (Polly 8.x), Node.js (Opossum 8.x), iPaaS (MuleSoft gateway policy, Boomi custom scripting).

System Profile

This card covers the circuit breaker pattern as applied to ERP API integrations across all major ERP systems. It is platform-agnostic but provides concrete implementations for the four major integration languages (Python, Java, C#, Node.js) and two leading iPaaS platforms (MuleSoft, Boomi). The pattern applies identically whether calling SAP S/4HANA OData, Salesforce REST, Oracle ERP Cloud REST, NetSuite SuiteTalk, or Dynamics 365 Web API.

Property	Value
Pattern	Circuit Breaker (client-side resilience)
Applies To	All ERP REST/SOAP/OData API calls
Granularity	One breaker per ERP endpoint or API surface
States	Closed (normal) → Open (failing fast) → Half-Open (probing recovery)
Implementations	Polly 8.x (.NET), Resilience4j 2.x (Java), Opossum 8.x (Node.js), Custom (Python)
iPaaS	MuleSoft (gateway policy), Boomi (custom scripting), Workato (custom connector)
Reference	Azure Architecture Center

API Surfaces & Capabilities

Circuit breakers protect calls to ERP API surfaces. Different API surfaces exhibit different failure modes and require different breaker configurations. [src1]

ERP API Surface	Typical Failure Mode	Recommended Breaker Config	Recovery Time
SAP S/4HANA OData	503 during planned downtime, timeouts on complex queries	5 failures / 30s window, 120s break	5-30 min (planned), 1-4h (incident)
Salesforce REST API	429 rate limit, REQUEST_LIMIT_EXCEEDED, 503	3 consecutive 429s, 60s break	60s (rate limit), 5-15 min (incident)
Oracle ERP Cloud REST	500/503 during patching, FBDI timeouts	5 failures / 60s window, 180s break	15-60 min (patching), 1-2h (incident)
NetSuite SuiteTalk/REST	SSS_REQUEST_LIMIT_EXCEEDED, concurrency cap	3 failures / 20s, 30s break	30-60s (concurrency), 10-30 min (incident)
Dynamics 365 OData	429 with Retry-After header, 503 during updates	Honor Retry-After header, 5 failures / 30s	Per Retry-After value, 5-30 min (updates)
Workday REST/SOAP	503 during tenant maintenance, auth token expiry	5 failures / 60s, 120s break	30-120 min (maintenance)

Rate Limits & Quotas

Circuit Breaker Configuration Parameters

Parameter	Description	Recommended Default	Notes
Failure threshold	Percentage or count of failures that trips the breaker	50% failure rate OR 5 consecutive failures	Percentage-based is more robust than count-based
Sampling window	Time period over which failures are counted	10-30 seconds	Too short = noise triggers opens; too long = slow detection
Minimum throughput	Minimum requests in window before threshold is evaluated	8-10 requests	Prevents tripping on 1 failure out of 2 requests
Break duration	How long the circuit stays open before half-open probe	30-120 seconds	Match to ERP typical recovery time
Half-open probe count	Number of test requests allowed in half-open state	1-3 requests	Too many probes can re-overload a recovering service
Success threshold	Consecutive successes in half-open to close circuit	3-5 successes	Ensures recovery is stable
Timeout	Per-request timeout that counts as failure	30-60 seconds for ERP APIs	ERP APIs are slower than typical microservices

Per-ERP Error Codes That Should Trip the Breaker

ERP System	Trip On (Open Circuit)	Do NOT Trip On (Retry Instead)	Notes
Salesforce	503, REQUEST_LIMIT_EXCEEDED, SERVER_UNAVAILABLE	400, INVALID_FIELD, DUPLICATE_VALUE	429 — trip after 3 consecutive, not on first
SAP S/4HANA	503, 504, CX_SY_RESOURCE_EXHAUSTION	400, /IWBEP/CM_MGW_RT (OData validation)	504 indicates SAP app server overload
Oracle ERP Cloud	503, 500 (repeated), FBDI import timeout	400, ORA-00001, validation errors	Distinguish transient 500 from persistent logic errors
NetSuite	SSS_REQUEST_LIMIT_EXCEEDED, SSS_CONCURRENT_LIMIT, 503	USER_ERROR, INVALID_FLD_VALUE	Governance errors are transient; validation permanent
Dynamics 365	429 (with Retry-After), 503, 502	400, 403, 404, -2147204784	Always honor Retry-After header

Authentication

Authentication failures interact with circuit breakers in specific ways. Token expiry should NOT trip the circuit. [src1]

Scenario	Should Trip Breaker?	Correct Handling
OAuth token expired (401)	No	Refresh token, retry once, then trip if refresh fails
API key invalid (403)	No	Alert immediately — config error, not transient
Auth server unreachable	Yes	Trip breaker on auth endpoint separately
MFA challenge required	No	Alert — cannot be automated; wrong auth flow
Rate limit on auth endpoint	Yes	Trip breaker; queue data requests until auth recovers

Constraints

Circuit breaker does NOT replace retry — Use retry for transient errors (first 2-3 attempts), then circuit breaker trips to prevent retry storms. Retry inside breaker, not breaker inside retry.
Per-endpoint granularity — One breaker per ERP API surface. A single breaker for all Salesforce APIs means a Bulk API timeout opens the circuit for REST operations.
State isolation — Circuit breaker state is in-memory by default. In Kubernetes with 10 pods, each pod has its own breaker. Externalize to Redis for shared state if needed.
Idempotency required for half-open probes — Half-open probe requests may be duplicates of previously failed requests. Without idempotency keys, you risk duplicate records.
Break duration must match ERP recovery — SAP maintenance: 30-120 min. Salesforce rate limit: 60s. A 5-second break is useless for a 30-minute maintenance window.
Cannot circuit-break fire-and-forget — Async message queues act as natural buffers. Use circuit breaker between queue consumer and ERP API, not on the queue itself.

Integration Pattern Decision Tree

START — Should I use a circuit breaker for this ERP integration?
|
+-- Is the integration synchronous (real-time API call)?
|   +-- YES --> Circuit breaker is strongly recommended
|   |   +-- Is the ERP API call idempotent?
|   |   |   +-- YES --> Standard circuit breaker + retry
|   |   |   +-- NO --> Circuit breaker + idempotency key + DLQ
|   |   +-- Are you calling multiple ERP endpoints?
|   |       +-- YES --> Separate breaker per endpoint
|   |       +-- NO --> Single breaker sufficient
|   +-- NO (async / message-based)
|
+-- Is there a synchronous ERP API call within the async flow?
|   +-- YES --> Circuit breaker on the API call, not the queue consumer
|   +-- NO --> Circuit breaker adds no value; use DLQ + retry instead
|
+-- Which resilience pattern do I need?
    +-- Transient errors (network blip, 1-2 failures) --> Retry with backoff
    +-- Sustained outage (ERP down for minutes) --> Circuit breaker
    +-- Protecting shared resources (thread pools) --> Bulkhead
    +-- Server-side request throttling --> Rate limiter
    +-- Combining all four --> Retry -> Circuit Breaker -> Bulkhead -> Timeout

Quick Reference

Pattern	Purpose	When to Use	Library
Circuit Breaker	Stop calling failing service, fail fast, auto-recover	ERP down or degraded for >30s	Polly, Resilience4j, Opossum
Retry	Retry transient failures with backoff	Network blip, single 500, rate limit	Polly, Resilience4j, tenacity
Bulkhead	Isolate resource pools per dependency	Multiple ERP integrations sharing thread pool	Polly, Resilience4j
Rate Limiter	Client-side request throttling	Stay under ERP API quota	Polly, Resilience4j, bottleneck
Timeout	Fail slow calls before thread exhaustion	ERP queries that can hang >60s	Built into all HTTP clients
Fallback	Return cached/default value when circuit open	Non-critical reads (product catalog)	Application logic
Dead Letter Queue	Capture failed writes for later replay	Write operations that must not be lost	Kafka DLQ, SQS DLQ, Azure Service Bus

Circuit Breaker States

State	Behavior	Transitions To	Trigger
Closed	All requests pass through; failures counted	Open	Failure rate exceeds threshold in sampling window
Open	All requests fail immediately (BrokenCircuitException)	Half-Open	Break duration timer expires
Half-Open	Limited probe requests allowed	Closed OR Open	Probe succeeds (→ Closed) or fails (→ Open)
Isolated	Manually held open via API (Polly only)	Closed	Manual reset via ManualControl.CloseAsync()

Step-by-Step Integration Guide

1. Determine the correct granularity for your circuit breakers

Map your ERP API calls to circuit breaker instances. Each distinct endpoint or API surface gets its own breaker. [src1, src3]

ERP Integration Architecture:
  +-- Salesforce
  |   +-- CB: sf-rest      (REST API)
  |   +-- CB: sf-bulk       (Bulk API 2.0)
  |   +-- CB: sf-streaming  (Platform Events)
  +-- SAP S/4HANA
  |   +-- CB: sap-odata     (OData v4)
  |   +-- CB: sap-soap      (SOAP — legacy BAPIs)
  +-- Oracle ERP Cloud
      +-- CB: oracle-rest   (REST API)
      +-- CB: oracle-fbdi   (FBDI — file imports)

Verify: Each breaker should have independent state. Tripping sf-bulk should NOT affect sf-rest.

2. Configure thresholds based on ERP behavior

Start with conservative defaults, tune based on production telemetry. [src1, src2]

Parameter	Conservative Start	Tuned After 30 Days
Failure rate threshold	50%	Adjust based on baseline error rate
Sampling window	30 seconds	Match to ERP API response time P99
Minimum throughput	10	Match to actual request volume
Break duration	60 seconds	Match to ERP typical recovery time
Half-open probes	3	Increase if recovery is gradual

Verify: Monitor circuit state transitions for 7 days. If breaker trips >5x/day on a healthy ERP, thresholds are too sensitive.

3. Implement the circuit breaker in your language

See Code Examples section below for complete implementations in Python, Java, C#, and Node.js. [src2, src4, src5, src8]

4. Wire the fallback strategy for open circuit

When the circuit is open, the application must do something useful instead of throwing an exception. [src1]

Circuit Open — What to do with the request:
  +-- Is the operation a READ?
  |   +-- Return cached data (if fresh enough)
  |   +-- Return degraded response ("ERP data temporarily unavailable")
  |   +-- Route to secondary ERP instance (if available)
  +-- Is the operation a WRITE?
  |   +-- Queue to dead letter queue for later replay
  |   +-- Write to local staging table + reconcile later
  |   +-- Return 503 + Retry-After header to upstream caller
  +-- ALWAYS:
      +-- Log circuit state change
      +-- Fire alert if circuit open >5 minutes
      +-- Increment circuit-open counter in metrics

Code Examples

Python: Circuit Breaker for ERP API Calls

# Input:  ERP API endpoint URL, authentication headers
# Output: API response or fallback value when circuit is open
# Requires: requests>=2.31.0

import time, threading, requests
from enum import Enum
from dataclasses import dataclass, field

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

@dataclass
class CircuitBreaker:
    failure_threshold: int = 5
    recovery_timeout: float = 60.0
    half_open_max_calls: int = 3
    success_threshold: int = 3
    erp_timeout: float = 30.0
    TRIP_STATUS_CODES = {429, 500, 502, 503, 504}

    # ... (see .md for full implementation)

# Usage: one breaker per ERP endpoint
sf_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
sap_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=120)

response = sf_breaker.call(
    "GET", "https://myorg.my.salesforce.com/services/data/v62.0/query",
    params={"q": "SELECT Id, Name FROM Account LIMIT 10"},
    headers={"Authorization": "Bearer <token>"},
    fallback=lambda: {"records": [], "note": "Salesforce unavailable"}
)

Java: Resilience4j Circuit Breaker for ERP APIs

// Input:  ERP API endpoint, authentication config
// Output: API response or fallback when circuit is open
// Requires: io.github.resilience4j:resilience4j-circuitbreaker:2.2.0

CircuitBreakerConfig sapConfig = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)
    .slidingWindowType(SlidingWindowType.TIME_BASED)
    .slidingWindowSize(30)
    .minimumNumberOfCalls(8)
    .waitDurationInOpenState(Duration.ofSeconds(120))
    .permittedNumberOfCallsInHalfOpenState(3)
    .recordExceptions(ConnectException.class, HttpTimeoutException.class)
    .build();

CircuitBreaker sapBreaker = CircuitBreakerRegistry.of(sapConfig)
    .circuitBreaker("sap-odata", sapConfig);

C#: Polly 8.x Circuit Breaker for ERP APIs

// Requires: Microsoft.Extensions.Http.Resilience 8.x, Polly 8.x

services.AddHttpClient("SapOData", client =>
{
    client.BaseAddress = new Uri("https://my-sap.s4hana.cloud/sap/opu/odata4/");
    client.Timeout = TimeSpan.FromSeconds(60);
})
.AddResilienceHandler("sap-pipeline", builder =>
{
    builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage>
    {
        FailureRatio = 0.5,
        SamplingDuration = TimeSpan.FromSeconds(30),
        MinimumThroughput = 8,
        BreakDuration = TimeSpan.FromSeconds(120),
        ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
            .HandleResult(r => r.StatusCode == HttpStatusCode.TooManyRequests
                            || r.StatusCode >= HttpStatusCode.InternalServerError)
            .Handle<HttpRequestException>()
    });
});

Node.js: Opossum Circuit Breaker for ERP APIs

// Requires: [email protected], [email protected]

import CircuitBreaker from "opossum";

async function callSalesforceApi(endpoint, options) {
  const response = await fetch(
    `https://myorg.my.salesforce.com/services/data/v62.0/${endpoint}`,
    { ...options, signal: AbortSignal.timeout(30000) }
  );
  if (response.status === 429 || response.status >= 500) {
    throw new Error(`Salesforce API error: ${response.status}`);
  }
  return response.json();
}

const sfBreaker = new CircuitBreaker(callSalesforceApi, {
  timeout: 30000,
  errorThresholdPercentage: 50,
  resetTimeout: 60000,
  volumeThreshold: 5,
});

sfBreaker.fallback(() => ({ records: [], _circuitOpen: true }));
sfBreaker.on("open", () => console.warn("[CIRCUIT OPEN] Salesforce"));
sfBreaker.on("close", () => console.info("[CIRCUIT CLOSED] Salesforce"));

Data Mapping

Configuration Mapping Across Libraries

Parameter	Polly 8.x (C#)	Resilience4j 2.x (Java)	Opossum 8.x (Node.js)	Python (Custom)
Failure rate	`FailureRatio` (0.0-1.0)	`failureRateThreshold` (0-100%)	`errorThresholdPercentage` (0-100)	`failure_threshold` (count)
Sampling window	`SamplingDuration` (TimeSpan)	`slidingWindowSize` (int seconds)	`rollingCountTimeout` (ms)	Custom (time.time())
Minimum throughput	`MinimumThroughput` (int)	`minimumNumberOfCalls` (int)	`volumeThreshold` (int)	`failure_threshold` (implicit)
Break duration	`BreakDuration` (TimeSpan)	`waitDurationInOpenState` (Duration)	`resetTimeout` (ms)	`recovery_timeout` (float s)
Manual control	`ManualControl.IsolateAsync()`	`transitionToForcedOpenState()`	`breaker.open()`	Direct state mutation

Data Type Gotchas

Polly uses ratio (0.0-1.0), Resilience4j uses percentage (0-100) — a FailureRatio of 0.5 in Polly equals failureRateThreshold(50) in Resilience4j. Mixing these up means 50x more or less sensitive. [src2, src8]
Opossum rollingCountTimeout is in milliseconds, Resilience4j slidingWindowSize is in seconds — 10000 in Opossum equals 10 in Resilience4j. Off by 1000x if you port config. [src5, src8]
Polly 8.x SamplingDuration replaced Polly 7.x durationOfBreak semantics — entire config model changed in the migration. [src2]

Error Handling & Failure Points

Common Error Codes

Code	Meaning	Should Trip Breaker?	Resolution
429	Rate limit exceeded	Yes (after 3 consecutive)	Backoff, respect Retry-After, align break to rate limit window
500	Internal Server Error	Yes (if repeated)	Trip after 3-5 in window; single 500 could be transient
502	Bad Gateway	Yes	Proxy/LB failure; ERP app server likely down
503	Service Unavailable	Yes (immediately)	ERP explicitly saying stop; trip immediately
504	Gateway Timeout	Yes	ERP app server overloaded
400	Bad Request	No	Fix the request payload — code bug, not outage
401	Unauthorized	No	Refresh auth token; trip only if refresh also fails
403	Forbidden	No	Permission issue; alert, don't trip
404	Not Found	No	Wrong endpoint; fix code

Failure Points in Production

False opens during ERP maintenance windows: SAP has scheduled downtime (02:00-04:00 UTC). Circuit trips, alerts fire. Fix: Implement maintenance window suppression — skip alerting during scheduled windows. [src1]
Token expiry cascade: OAuth token expires, all requests get 401, breaker trips. Fix: Exclude 401 from circuit breaker; implement separate token refresh circuit. [src1, src3]
Half-open probe creates duplicate record: Probe retries write without idempotency key. Fix: Every ERP write must include idempotency key (externalId in NetSuite, External_ID__c in Salesforce). [src3]
Break duration too short for Oracle patching: 15-60 min patching, 30s break cycles 120 times. Fix: Implement exponential break duration — 30s, 60s, 120s, 240s, max 600s. [src1]
Thread exhaustion before circuit trips: 30s timeout x 5 failures = 150s of blocked threads. Fix: Set aggressive per-request timeout (10-15s) + bulkhead to limit concurrent ERP calls. [src7]

Anti-Patterns

Wrong: Circuit breaker on writes without DLQ

# BAD — writes are silently dropped when circuit opens
try:
    breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
    logger.warning("Circuit open — invoice not created")
    # Invoice is LOST. No retry. No queue. Gone forever.

Correct: Circuit breaker with dead letter queue for writes

# GOOD — failed writes are queued for later replay
try:
    breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
    dlq.send({
        "endpoint": f"{erp_url}/invoices",
        "payload": invoice_data,
        "idempotency_key": invoice_data["externalId"],
    })

Wrong: Single circuit breaker for all ERP endpoints

// BAD — Bulk API timeout trips the breaker for REST API too
const erpBreaker = new CircuitBreaker(callAnyErpApi, { timeout: 30000 });
await erpBreaker.fire("bulk/import", bulkPayload);  // hangs 60s, trips
await erpBreaker.fire("query/accounts", {});         // BLOCKED!

Correct: Separate circuit breaker per API surface

// GOOD — each API surface has its own breaker
const sfRestBreaker = new CircuitBreaker(callSfRest, { timeout: 15000 });
const sfBulkBreaker = new CircuitBreaker(callSfBulk, { timeout: 300000 });
// Bulk timeout does NOT affect REST operations

Wrong: Too-sensitive threshold on high-latency ERP APIs

# BAD — ERP APIs are NOT microservices
resilience4j.circuitbreaker.instances.sap-odata:
  failureRateThreshold: 20    # too sensitive
  slidingWindowSize: 5         # too short
  minimumNumberOfCalls: 2      # too few
  waitDurationInOpenState: 5s  # too short

Correct: Thresholds calibrated for ERP latency profiles

# GOOD — tuned for real ERP behavior
resilience4j.circuitbreaker.instances.sap-odata:
  failureRateThreshold: 50
  slidingWindowSize: 30
  minimumNumberOfCalls: 8
  waitDurationInOpenState: 120s
  permittedNumberOfCallsInHalfOpenState: 3

Common Pitfalls

Treating 401 as a circuit-tripping failure: Token expiry causes 401, breaker trips, all ERP calls blocked. Fix: Exclude 401 from breaker; handle auth separately with refresh logic. [src1]
Using microservice-scale timeouts for ERP APIs: A 3-second timeout causes false timeouts on ERP APIs that routinely take 5-15s. Fix: Set ERP-specific timeouts: 30s for REST, 60s for bulk, 120s for file imports. [src1, src7]
Not implementing exponential break duration: Fixed 30s break during 2h SAP outage = 240 unnecessary probes. Fix: Exponential backoff on break duration: 30s, 60s, 120s, 240s, capped at 600s. [src1]
Circuit breaker without monitoring: Breaker trips and nobody knows. Fix: Log every state transition, expose as health check, alert for circuits open >5 min. [src1, src7]
Applying circuit breaker to queue consumers: Consumer stops but messages keep arriving, filling the partition. Fix: Apply breaker between consumer and ERP API, not on consumer itself. [src3]
Sharing state across instances without coordination: One instance's network blip trips shared Redis-backed circuit for all. Fix: Use local breakers with shared metrics; trip global circuit via feature flag if >50% report failures. [src1]

Diagnostic Commands

# Check Resilience4j circuit breaker state (Spring Boot Actuator)
curl -s http://localhost:8080/actuator/circuitbreakers | jq '.circuitBreakers'
# Expected: {"sap-odata":{"state":"CLOSED","failureRate":-1.0}}

# Check Resilience4j circuit breaker events
curl -s http://localhost:8080/actuator/circuitbreakerevents | jq '.circuitBreakerEvents[-5:]'

# Check Polly circuit state via custom health endpoint
curl -s http://localhost:5000/health/circuits | jq
# Expected: {"sap-odata":"Closed","sf-rest":"Closed","oracle-rest":"HalfOpen"}

# Test ERP API health directly (bypassing circuit breaker)
# Salesforce
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $SF_TOKEN" \
  "https://myorg.my.salesforce.com/services/data/v62.0/limits"

# SAP S/4HANA
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $SAP_TOKEN" \
  "https://my-sap.s4hana.cloud/sap/opu/odata4/sap/api_business_partner/\$metadata"

# Check if ERP is in maintenance (manual probe)
curl -s -w "\n%{http_code} %{time_total}s" \
  -H "Authorization: Bearer $TOKEN" \
  "$ERP_API_URL/health" 2>&1
# 503 = maintenance; 200 with >10s = degraded; 200 with <2s = healthy

Version History & Compatibility

Library	Version	Release	Breaking Changes	Notes
Polly	8.x	2023-07	Complete API rewrite — Policy replaced by ResiliencePipeline	Cannot mix v7 and v8
Polly	7.x	2019-06	Legacy — maintenance only	Still widely used; plan migration
Resilience4j	2.2.0	2024-03	Minor — TIME_BASED sliding window improvements	Recommended for new Java projects
Resilience4j	1.x	2020-01	EOL	Migrate to 2.x
Opossum	8.1.3	2025-01	Minor — improved TypeScript types	Stable; primary Node.js option

iPaaS Circuit Breaker Support

Platform	Built-in?	Config Level	Notes
MuleSoft	Yes — gateway policy	API Gateway (Envoy)	maxConnections, maxPendingRequests, maxRequests
Boomi	No — custom scripting	Process level	Implement in Data Process shape with Groovy
Workato	No — custom connector	Connector SDK	Build in custom connector actions
SAP Integration Suite	Partial — retry + timeout	iFlow level	No native breaker; use Groovy + JCache

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Synchronous ERP API calls that can cascade failures	Async message-based integration (Kafka, SQS)	Dead letter queue + retry policy
ERP has known maintenance windows causing extended downtime	Single transient error that resolves on retry	Simple retry with exponential backoff
Multiple downstream ERP endpoints with independent failure modes	All calls go through a single gateway handling resilience	Gateway-level circuit breaker (MuleSoft policy)
Long-lived service processing continuous requests	One-time batch job that runs once and exits	Retry + error log
Saga pattern with multiple ERP steps needing protection	Simple two-system point-to-point integration	Retry + DLQ (circuit breaker is overkill)

Cross-System Comparison

Library Comparison for Custom Implementations

Capability	Polly 8.x	Resilience4j 2.x	Opossum 8.x	pybreaker 1.x
Language	C# / .NET	Java / Kotlin	Node.js / TypeScript	Python
Circuit states	4 (incl. Isolated)	3	3	3
Sliding window	Time-based	Time or Count	Time-based (rolling)	Count-based
Dynamic break	BreakDurationGenerator	Custom (extend)	Not built-in	Custom (extend)
HttpClient integration	Native (IHttpClientFactory)	Spring WebClient	Manual wrap	Manual wrap
Monitoring	Built-in telemetry	Actuator + Micrometer	Events + Prometheus	Custom events
Bulkhead	Yes (same pipeline)	Yes (separate)	No	No
Rate limiter	Yes (same pipeline)	Yes (separate)	No	No
Maturity	Very high	Very high	High	Moderate

Important Caveats

ERP APIs have fundamentally different latency profiles than microservices (5-15s P95 vs 100ms P95) — do not use microservice-default configurations
Circuit breaker state is lost on process restart — if your service restarts while the ERP is still down, the circuit starts Closed and must re-learn the failure state
Different ERP editions have different rate limits (Salesforce Enterprise: 100K/24h vs Developer: 15K/24h) — breaker thresholds should account for edition-specific limits
In multi-tenant iPaaS deployments, circuit breaker state for one tenant's ERP should not affect another tenant's circuit — tenant isolation is critical
Library versions change configuration APIs significantly (Polly 7 vs 8 is a complete rewrite) — pin library versions and test after upgrades
This card covers implementations as of March 2026. ERP API error codes, rate limits, and maintenance schedules are subject to change with each ERP release