Circuit Breaker Pattern for ERP API Integrations
Type: ERP Integration
System: Cross-ERP (Pattern-level)
Confidence: 0.88
Sources: 8
Verified: 2026-03-07
Freshness: evolving
TL;DR
- Bottom line: Wrap every outbound ERP API call in a circuit breaker that trips after a configurable failure threshold, fails fast while the ERP recovers, and probes with limited half-open requests before restoring full traffic.
- Key limit: Circuit breaker is per-process state by default — horizontal scaling requires shared state (Redis) or per-instance breakers with coordinated thresholds.
- Watch out for: Setting thresholds too sensitive (trips on 2 failures) causes false opens during normal ERP latency spikes; too tolerant (50 failures) defeats the purpose. Start at 50% failure rate over 10-second windows with minimum 8 requests sampled.
- Best for: Real-time ERP API integrations where downstream unavailability (SAP maintenance, Salesforce governor limits, Oracle Cloud outages) would cascade into thread exhaustion, connection pool starvation, or saga timeout failures.
- Libraries: Python (custom or pybreaker), Java (Resilience4j 2.x), C# (Polly 8.x), Node.js (Opossum 8.x), iPaaS (MuleSoft gateway policy, Boomi custom scripting).
System Profile
This card covers the circuit breaker pattern as applied to ERP API integrations across all major ERP systems. It is platform-agnostic but provides concrete implementations for the four major integration languages (Python, Java, C#, Node.js) and two leading iPaaS platforms (MuleSoft, Boomi). The pattern applies identically whether calling SAP S/4HANA OData, Salesforce REST, Oracle ERP Cloud REST, NetSuite SuiteTalk, or Dynamics 365 Web API.
| Property | Value |
| Pattern | Circuit Breaker (client-side resilience) |
| Applies To | All ERP REST/SOAP/OData API calls |
| Granularity | One breaker per ERP endpoint or API surface |
| States | Closed (normal) → Open (failing fast) → Half-Open (probing recovery) |
| Implementations | Polly 8.x (.NET), Resilience4j 2.x (Java), Opossum 8.x (Node.js), Custom (Python) |
| iPaaS | MuleSoft (gateway policy), Boomi (custom scripting), Workato (custom connector) |
| Reference | Azure Architecture Center |
API Surfaces & Capabilities
Circuit breakers protect calls to ERP API surfaces. Different API surfaces exhibit different failure modes and require different breaker configurations. [src1]
| ERP API Surface | Typical Failure Mode | Recommended Breaker Config | Recovery Time |
| SAP S/4HANA OData | 503 during planned downtime, timeouts on complex queries | 5 failures / 30s window, 120s break | 5-30 min (planned), 1-4h (incident) |
| Salesforce REST API | 429 rate limit, REQUEST_LIMIT_EXCEEDED, 503 | 3 consecutive 429s, 60s break | 60s (rate limit), 5-15 min (incident) |
| Oracle ERP Cloud REST | 500/503 during patching, FBDI timeouts | 5 failures / 60s window, 180s break | 15-60 min (patching), 1-2h (incident) |
| NetSuite SuiteTalk/REST | SSS_REQUEST_LIMIT_EXCEEDED, concurrency cap | 3 failures / 20s, 30s break | 30-60s (concurrency), 10-30 min (incident) |
| Dynamics 365 OData | 429 with Retry-After header, 503 during updates | Honor Retry-After header, 5 failures / 30s | Per Retry-After value, 5-30 min (updates) |
| Workday REST/SOAP | 503 during tenant maintenance, auth token expiry | 5 failures / 60s, 120s break | 30-120 min (maintenance) |
Rate Limits & Quotas
Circuit Breaker Configuration Parameters
| Parameter | Description | Recommended Default | Notes |
| Failure threshold | Percentage or count of failures that trips the breaker | 50% failure rate OR 5 consecutive failures | Percentage-based is more robust than count-based |
| Sampling window | Time period over which failures are counted | 10-30 seconds | Too short = noise triggers opens; too long = slow detection |
| Minimum throughput | Minimum requests in window before threshold is evaluated | 8-10 requests | Prevents tripping on 1 failure out of 2 requests |
| Break duration | How long the circuit stays open before half-open probe | 30-120 seconds | Match to ERP typical recovery time |
| Half-open probe count | Number of test requests allowed in half-open state | 1-3 requests | Too many probes can re-overload a recovering service |
| Success threshold | Consecutive successes in half-open to close circuit | 3-5 successes | Ensures recovery is stable |
| Timeout | Per-request timeout that counts as failure | 30-60 seconds for ERP APIs | ERP APIs are slower than typical microservices |
Per-ERP Error Codes That Should Trip the Breaker
| ERP System | Trip On (Open Circuit) | Do NOT Trip On (Retry Instead) | Notes |
| Salesforce | 503, REQUEST_LIMIT_EXCEEDED, SERVER_UNAVAILABLE | 400, INVALID_FIELD, DUPLICATE_VALUE | 429 — trip after 3 consecutive, not on first |
| SAP S/4HANA | 503, 504, CX_SY_RESOURCE_EXHAUSTION | 400, /IWBEP/CM_MGW_RT (OData validation) | 504 indicates SAP app server overload |
| Oracle ERP Cloud | 503, 500 (repeated), FBDI import timeout | 400, ORA-00001, validation errors | Distinguish transient 500 from persistent logic errors |
| NetSuite | SSS_REQUEST_LIMIT_EXCEEDED, SSS_CONCURRENT_LIMIT, 503 | USER_ERROR, INVALID_FLD_VALUE | Governance errors are transient; validation permanent |
| Dynamics 365 | 429 (with Retry-After), 503, 502 | 400, 403, 404, -2147204784 | Always honor Retry-After header |
Authentication
Authentication failures interact with circuit breakers in specific ways. Token expiry should NOT trip the circuit. [src1]
| Scenario | Should Trip Breaker? | Correct Handling |
| OAuth token expired (401) | No | Refresh token, retry once, then trip if refresh fails |
| API key invalid (403) | No | Alert immediately — config error, not transient |
| Auth server unreachable | Yes | Trip breaker on auth endpoint separately |
| MFA challenge required | No | Alert — cannot be automated; wrong auth flow |
| Rate limit on auth endpoint | Yes | Trip breaker; queue data requests until auth recovers |
Constraints
- Circuit breaker does NOT replace retry — Use retry for transient errors (first 2-3 attempts), then circuit breaker trips to prevent retry storms. Retry inside breaker, not breaker inside retry.
- Per-endpoint granularity — One breaker per ERP API surface. A single breaker for all Salesforce APIs means a Bulk API timeout opens the circuit for REST operations.
- State isolation — Circuit breaker state is in-memory by default. In Kubernetes with 10 pods, each pod has its own breaker. Externalize to Redis for shared state if needed.
- Idempotency required for half-open probes — Half-open probe requests may be duplicates of previously failed requests. Without idempotency keys, you risk duplicate records.
- Break duration must match ERP recovery — SAP maintenance: 30-120 min. Salesforce rate limit: 60s. A 5-second break is useless for a 30-minute maintenance window.
- Cannot circuit-break fire-and-forget — Async message queues act as natural buffers. Use circuit breaker between queue consumer and ERP API, not on the queue itself.
Integration Pattern Decision Tree
START — Should I use a circuit breaker for this ERP integration?
|
+-- Is the integration synchronous (real-time API call)?
| +-- YES --> Circuit breaker is strongly recommended
| | +-- Is the ERP API call idempotent?
| | | +-- YES --> Standard circuit breaker + retry
| | | +-- NO --> Circuit breaker + idempotency key + DLQ
| | +-- Are you calling multiple ERP endpoints?
| | +-- YES --> Separate breaker per endpoint
| | +-- NO --> Single breaker sufficient
| +-- NO (async / message-based)
|
+-- Is there a synchronous ERP API call within the async flow?
| +-- YES --> Circuit breaker on the API call, not the queue consumer
| +-- NO --> Circuit breaker adds no value; use DLQ + retry instead
|
+-- Which resilience pattern do I need?
+-- Transient errors (network blip, 1-2 failures) --> Retry with backoff
+-- Sustained outage (ERP down for minutes) --> Circuit breaker
+-- Protecting shared resources (thread pools) --> Bulkhead
+-- Server-side request throttling --> Rate limiter
+-- Combining all four --> Retry -> Circuit Breaker -> Bulkhead -> Timeout
Quick Reference
| Pattern | Purpose | When to Use | Library |
| Circuit Breaker | Stop calling failing service, fail fast, auto-recover | ERP down or degraded for >30s | Polly, Resilience4j, Opossum |
| Retry | Retry transient failures with backoff | Network blip, single 500, rate limit | Polly, Resilience4j, tenacity |
| Bulkhead | Isolate resource pools per dependency | Multiple ERP integrations sharing thread pool | Polly, Resilience4j |
| Rate Limiter | Client-side request throttling | Stay under ERP API quota | Polly, Resilience4j, bottleneck |
| Timeout | Fail slow calls before thread exhaustion | ERP queries that can hang >60s | Built into all HTTP clients |
| Fallback | Return cached/default value when circuit open | Non-critical reads (product catalog) | Application logic |
| Dead Letter Queue | Capture failed writes for later replay | Write operations that must not be lost | Kafka DLQ, SQS DLQ, Azure Service Bus |
Circuit Breaker States
| State | Behavior | Transitions To | Trigger |
| Closed | All requests pass through; failures counted | Open | Failure rate exceeds threshold in sampling window |
| Open | All requests fail immediately (BrokenCircuitException) | Half-Open | Break duration timer expires |
| Half-Open | Limited probe requests allowed | Closed OR Open | Probe succeeds (→ Closed) or fails (→ Open) |
| Isolated | Manually held open via API (Polly only) | Closed | Manual reset via ManualControl.CloseAsync() |
Step-by-Step Integration Guide
1. Determine the correct granularity for your circuit breakers
Map your ERP API calls to circuit breaker instances. Each distinct endpoint or API surface gets its own breaker. [src1, src3]
ERP Integration Architecture:
+-- Salesforce
| +-- CB: sf-rest (REST API)
| +-- CB: sf-bulk (Bulk API 2.0)
| +-- CB: sf-streaming (Platform Events)
+-- SAP S/4HANA
| +-- CB: sap-odata (OData v4)
| +-- CB: sap-soap (SOAP — legacy BAPIs)
+-- Oracle ERP Cloud
+-- CB: oracle-rest (REST API)
+-- CB: oracle-fbdi (FBDI — file imports)
Verify: Each breaker should have independent state. Tripping sf-bulk should NOT affect sf-rest.
2. Configure thresholds based on ERP behavior
Start with conservative defaults, tune based on production telemetry. [src1, src2]
| Parameter | Conservative Start | Tuned After 30 Days |
| Failure rate threshold | 50% | Adjust based on baseline error rate |
| Sampling window | 30 seconds | Match to ERP API response time P99 |
| Minimum throughput | 10 | Match to actual request volume |
| Break duration | 60 seconds | Match to ERP typical recovery time |
| Half-open probes | 3 | Increase if recovery is gradual |
Verify: Monitor circuit state transitions for 7 days. If breaker trips >5x/day on a healthy ERP, thresholds are too sensitive.
3. Implement the circuit breaker in your language
See Code Examples section below for complete implementations in Python, Java, C#, and Node.js. [src2, src4, src5, src8]
4. Wire the fallback strategy for open circuit
When the circuit is open, the application must do something useful instead of throwing an exception. [src1]
Circuit Open — What to do with the request:
+-- Is the operation a READ?
| +-- Return cached data (if fresh enough)
| +-- Return degraded response ("ERP data temporarily unavailable")
| +-- Route to secondary ERP instance (if available)
+-- Is the operation a WRITE?
| +-- Queue to dead letter queue for later replay
| +-- Write to local staging table + reconcile later
| +-- Return 503 + Retry-After header to upstream caller
+-- ALWAYS:
+-- Log circuit state change
+-- Fire alert if circuit open >5 minutes
+-- Increment circuit-open counter in metrics
Code Examples
Python: Circuit Breaker for ERP API Calls
# Input: ERP API endpoint URL, authentication headers
# Output: API response or fallback value when circuit is open
# Requires: requests>=2.31.0
import time, threading, requests
from enum import Enum
from dataclasses import dataclass, field
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
@dataclass
class CircuitBreaker:
failure_threshold: int = 5
recovery_timeout: float = 60.0
half_open_max_calls: int = 3
success_threshold: int = 3
erp_timeout: float = 30.0
TRIP_STATUS_CODES = {429, 500, 502, 503, 504}
# ... (see .md for full implementation)
# Usage: one breaker per ERP endpoint
sf_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60)
sap_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=120)
response = sf_breaker.call(
"GET", "https://myorg.my.salesforce.com/services/data/v62.0/query",
params={"q": "SELECT Id, Name FROM Account LIMIT 10"},
headers={"Authorization": "Bearer <token>"},
fallback=lambda: {"records": [], "note": "Salesforce unavailable"}
)
Java: Resilience4j Circuit Breaker for ERP APIs
// Input: ERP API endpoint, authentication config
// Output: API response or fallback when circuit is open
// Requires: io.github.resilience4j:resilience4j-circuitbreaker:2.2.0
CircuitBreakerConfig sapConfig = CircuitBreakerConfig.custom()
.failureRateThreshold(50)
.slidingWindowType(SlidingWindowType.TIME_BASED)
.slidingWindowSize(30)
.minimumNumberOfCalls(8)
.waitDurationInOpenState(Duration.ofSeconds(120))
.permittedNumberOfCallsInHalfOpenState(3)
.recordExceptions(ConnectException.class, HttpTimeoutException.class)
.build();
CircuitBreaker sapBreaker = CircuitBreakerRegistry.of(sapConfig)
.circuitBreaker("sap-odata", sapConfig);
C#: Polly 8.x Circuit Breaker for ERP APIs
// Requires: Microsoft.Extensions.Http.Resilience 8.x, Polly 8.x
services.AddHttpClient("SapOData", client =>
{
client.BaseAddress = new Uri("https://my-sap.s4hana.cloud/sap/opu/odata4/");
client.Timeout = TimeSpan.FromSeconds(60);
})
.AddResilienceHandler("sap-pipeline", builder =>
{
builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage>
{
FailureRatio = 0.5,
SamplingDuration = TimeSpan.FromSeconds(30),
MinimumThroughput = 8,
BreakDuration = TimeSpan.FromSeconds(120),
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
.HandleResult(r => r.StatusCode == HttpStatusCode.TooManyRequests
|| r.StatusCode >= HttpStatusCode.InternalServerError)
.Handle<HttpRequestException>()
});
});
Node.js: Opossum Circuit Breaker for ERP APIs
// Requires: [email protected], [email protected]
import CircuitBreaker from "opossum";
async function callSalesforceApi(endpoint, options) {
const response = await fetch(
`https://myorg.my.salesforce.com/services/data/v62.0/${endpoint}`,
{ ...options, signal: AbortSignal.timeout(30000) }
);
if (response.status === 429 || response.status >= 500) {
throw new Error(`Salesforce API error: ${response.status}`);
}
return response.json();
}
const sfBreaker = new CircuitBreaker(callSalesforceApi, {
timeout: 30000,
errorThresholdPercentage: 50,
resetTimeout: 60000,
volumeThreshold: 5,
});
sfBreaker.fallback(() => ({ records: [], _circuitOpen: true }));
sfBreaker.on("open", () => console.warn("[CIRCUIT OPEN] Salesforce"));
sfBreaker.on("close", () => console.info("[CIRCUIT CLOSED] Salesforce"));
Data Mapping
Configuration Mapping Across Libraries
| Parameter | Polly 8.x (C#) | Resilience4j 2.x (Java) | Opossum 8.x (Node.js) | Python (Custom) |
| Failure rate | FailureRatio (0.0-1.0) | failureRateThreshold (0-100%) | errorThresholdPercentage (0-100) | failure_threshold (count) |
| Sampling window | SamplingDuration (TimeSpan) | slidingWindowSize (int seconds) | rollingCountTimeout (ms) | Custom (time.time()) |
| Minimum throughput | MinimumThroughput (int) | minimumNumberOfCalls (int) | volumeThreshold (int) | failure_threshold (implicit) |
| Break duration | BreakDuration (TimeSpan) | waitDurationInOpenState (Duration) | resetTimeout (ms) | recovery_timeout (float s) |
| Manual control | ManualControl.IsolateAsync() | transitionToForcedOpenState() | breaker.open() | Direct state mutation |
Data Type Gotchas
- Polly uses ratio (0.0-1.0), Resilience4j uses percentage (0-100) — a
FailureRatio of 0.5 in Polly equals failureRateThreshold(50) in Resilience4j. Mixing these up means 50x more or less sensitive. [src2, src8]
- Opossum rollingCountTimeout is in milliseconds, Resilience4j slidingWindowSize is in seconds —
10000 in Opossum equals 10 in Resilience4j. Off by 1000x if you port config. [src5, src8]
- Polly 8.x SamplingDuration replaced Polly 7.x durationOfBreak semantics — entire config model changed in the migration. [src2]
Error Handling & Failure Points
Common Error Codes
| Code | Meaning | Should Trip Breaker? | Resolution |
| 429 | Rate limit exceeded | Yes (after 3 consecutive) | Backoff, respect Retry-After, align break to rate limit window |
| 500 | Internal Server Error | Yes (if repeated) | Trip after 3-5 in window; single 500 could be transient |
| 502 | Bad Gateway | Yes | Proxy/LB failure; ERP app server likely down |
| 503 | Service Unavailable | Yes (immediately) | ERP explicitly saying stop; trip immediately |
| 504 | Gateway Timeout | Yes | ERP app server overloaded |
| 400 | Bad Request | No | Fix the request payload — code bug, not outage |
| 401 | Unauthorized | No | Refresh auth token; trip only if refresh also fails |
| 403 | Forbidden | No | Permission issue; alert, don't trip |
| 404 | Not Found | No | Wrong endpoint; fix code |
Failure Points in Production
- False opens during ERP maintenance windows: SAP has scheduled downtime (02:00-04:00 UTC). Circuit trips, alerts fire. Fix:
Implement maintenance window suppression — skip alerting during scheduled windows. [src1]
- Token expiry cascade: OAuth token expires, all requests get 401, breaker trips. Fix:
Exclude 401 from circuit breaker; implement separate token refresh circuit. [src1, src3]
- Half-open probe creates duplicate record: Probe retries write without idempotency key. Fix:
Every ERP write must include idempotency key (externalId in NetSuite, External_ID__c in Salesforce). [src3]
- Break duration too short for Oracle patching: 15-60 min patching, 30s break cycles 120 times. Fix:
Implement exponential break duration — 30s, 60s, 120s, 240s, max 600s. [src1]
- Thread exhaustion before circuit trips: 30s timeout x 5 failures = 150s of blocked threads. Fix:
Set aggressive per-request timeout (10-15s) + bulkhead to limit concurrent ERP calls. [src7]
Anti-Patterns
Wrong: Circuit breaker on writes without DLQ
# BAD — writes are silently dropped when circuit opens
try:
breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
logger.warning("Circuit open — invoice not created")
# Invoice is LOST. No retry. No queue. Gone forever.
Correct: Circuit breaker with dead letter queue for writes
# GOOD — failed writes are queued for later replay
try:
breaker.call("POST", f"{erp_url}/invoices", json=invoice_data)
except CircuitOpenError:
dlq.send({
"endpoint": f"{erp_url}/invoices",
"payload": invoice_data,
"idempotency_key": invoice_data["externalId"],
})
Wrong: Single circuit breaker for all ERP endpoints
// BAD — Bulk API timeout trips the breaker for REST API too
const erpBreaker = new CircuitBreaker(callAnyErpApi, { timeout: 30000 });
await erpBreaker.fire("bulk/import", bulkPayload); // hangs 60s, trips
await erpBreaker.fire("query/accounts", {}); // BLOCKED!
Correct: Separate circuit breaker per API surface
// GOOD — each API surface has its own breaker
const sfRestBreaker = new CircuitBreaker(callSfRest, { timeout: 15000 });
const sfBulkBreaker = new CircuitBreaker(callSfBulk, { timeout: 300000 });
// Bulk timeout does NOT affect REST operations
Wrong: Too-sensitive threshold on high-latency ERP APIs
# BAD — ERP APIs are NOT microservices
resilience4j.circuitbreaker.instances.sap-odata:
failureRateThreshold: 20 # too sensitive
slidingWindowSize: 5 # too short
minimumNumberOfCalls: 2 # too few
waitDurationInOpenState: 5s # too short
Correct: Thresholds calibrated for ERP latency profiles
# GOOD — tuned for real ERP behavior
resilience4j.circuitbreaker.instances.sap-odata:
failureRateThreshold: 50
slidingWindowSize: 30
minimumNumberOfCalls: 8
waitDurationInOpenState: 120s
permittedNumberOfCallsInHalfOpenState: 3
Common Pitfalls
- Treating 401 as a circuit-tripping failure: Token expiry causes 401, breaker trips, all ERP calls blocked. Fix:
Exclude 401 from breaker; handle auth separately with refresh logic. [src1]
- Using microservice-scale timeouts for ERP APIs: A 3-second timeout causes false timeouts on ERP APIs that routinely take 5-15s. Fix:
Set ERP-specific timeouts: 30s for REST, 60s for bulk, 120s for file imports. [src1, src7]
- Not implementing exponential break duration: Fixed 30s break during 2h SAP outage = 240 unnecessary probes. Fix:
Exponential backoff on break duration: 30s, 60s, 120s, 240s, capped at 600s. [src1]
- Circuit breaker without monitoring: Breaker trips and nobody knows. Fix:
Log every state transition, expose as health check, alert for circuits open >5 min. [src1, src7]
- Applying circuit breaker to queue consumers: Consumer stops but messages keep arriving, filling the partition. Fix:
Apply breaker between consumer and ERP API, not on consumer itself. [src3]
- Sharing state across instances without coordination: One instance's network blip trips shared Redis-backed circuit for all. Fix:
Use local breakers with shared metrics; trip global circuit via feature flag if >50% report failures. [src1]
Diagnostic Commands
# Check Resilience4j circuit breaker state (Spring Boot Actuator)
curl -s http://localhost:8080/actuator/circuitbreakers | jq '.circuitBreakers'
# Expected: {"sap-odata":{"state":"CLOSED","failureRate":-1.0}}
# Check Resilience4j circuit breaker events
curl -s http://localhost:8080/actuator/circuitbreakerevents | jq '.circuitBreakerEvents[-5:]'
# Check Polly circuit state via custom health endpoint
curl -s http://localhost:5000/health/circuits | jq
# Expected: {"sap-odata":"Closed","sf-rest":"Closed","oracle-rest":"HalfOpen"}
# Test ERP API health directly (bypassing circuit breaker)
# Salesforce
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $SF_TOKEN" \
"https://myorg.my.salesforce.com/services/data/v62.0/limits"
# SAP S/4HANA
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $SAP_TOKEN" \
"https://my-sap.s4hana.cloud/sap/opu/odata4/sap/api_business_partner/\$metadata"
# Check if ERP is in maintenance (manual probe)
curl -s -w "\n%{http_code} %{time_total}s" \
-H "Authorization: Bearer $TOKEN" \
"$ERP_API_URL/health" 2>&1
# 503 = maintenance; 200 with >10s = degraded; 200 with <2s = healthy
Version History & Compatibility
| Library | Version | Release | Breaking Changes | Notes |
| Polly | 8.x | 2023-07 | Complete API rewrite — Policy replaced by ResiliencePipeline | Cannot mix v7 and v8 |
| Polly | 7.x | 2019-06 | Legacy — maintenance only | Still widely used; plan migration |
| Resilience4j | 2.2.0 | 2024-03 | Minor — TIME_BASED sliding window improvements | Recommended for new Java projects |
| Resilience4j | 1.x | 2020-01 | EOL | Migrate to 2.x |
| Opossum | 8.1.3 | 2025-01 | Minor — improved TypeScript types | Stable; primary Node.js option |
iPaaS Circuit Breaker Support
| Platform | Built-in? | Config Level | Notes |
| MuleSoft | Yes — gateway policy | API Gateway (Envoy) | maxConnections, maxPendingRequests, maxRequests |
| Boomi | No — custom scripting | Process level | Implement in Data Process shape with Groovy |
| Workato | No — custom connector | Connector SDK | Build in custom connector actions |
| SAP Integration Suite | Partial — retry + timeout | iFlow level | No native breaker; use Groovy + JCache |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
| Synchronous ERP API calls that can cascade failures | Async message-based integration (Kafka, SQS) | Dead letter queue + retry policy |
| ERP has known maintenance windows causing extended downtime | Single transient error that resolves on retry | Simple retry with exponential backoff |
| Multiple downstream ERP endpoints with independent failure modes | All calls go through a single gateway handling resilience | Gateway-level circuit breaker (MuleSoft policy) |
| Long-lived service processing continuous requests | One-time batch job that runs once and exits | Retry + error log |
| Saga pattern with multiple ERP steps needing protection | Simple two-system point-to-point integration | Retry + DLQ (circuit breaker is overkill) |
Cross-System Comparison
Library Comparison for Custom Implementations
| Capability | Polly 8.x | Resilience4j 2.x | Opossum 8.x | pybreaker 1.x |
| Language | C# / .NET | Java / Kotlin | Node.js / TypeScript | Python |
| Circuit states | 4 (incl. Isolated) | 3 | 3 | 3 |
| Sliding window | Time-based | Time or Count | Time-based (rolling) | Count-based |
| Dynamic break | BreakDurationGenerator | Custom (extend) | Not built-in | Custom (extend) |
| HttpClient integration | Native (IHttpClientFactory) | Spring WebClient | Manual wrap | Manual wrap |
| Monitoring | Built-in telemetry | Actuator + Micrometer | Events + Prometheus | Custom events |
| Bulkhead | Yes (same pipeline) | Yes (separate) | No | No |
| Rate limiter | Yes (same pipeline) | Yes (separate) | No | No |
| Maturity | Very high | Very high | High | Moderate |
Important Caveats
- ERP APIs have fundamentally different latency profiles than microservices (5-15s P95 vs 100ms P95) — do not use microservice-default configurations
- Circuit breaker state is lost on process restart — if your service restarts while the ERP is still down, the circuit starts Closed and must re-learn the failure state
- Different ERP editions have different rate limits (Salesforce Enterprise: 100K/24h vs Developer: 15K/24h) — breaker thresholds should account for edition-specific limits
- In multi-tenant iPaaS deployments, circuit breaker state for one tenant's ERP should not affect another tenant's circuit — tenant isolation is critical
- Library versions change configuration APIs significantly (Polly 7 vs 8 is a complete rewrite) — pin library versions and test after upgrades
- This card covers implementations as of March 2026. ERP API error codes, rate limits, and maintenance schedules are subject to change with each ERP release
Related Units