This card covers poison message handling as a cross-platform architecture pattern for ERP integrations. It focuses specifically on what happens after a message exhausts its retry budget and lands in a dead letter queue — detection, classification, triage, remediation, and replay. For retry strategies that determine when a message becomes a poison message (exponential backoff, circuit breakers), see the companion card on error handling and DLQ fundamentals.
The patterns apply across all major message brokers (AWS SQS, Azure Service Bus, Apache Kafka, RabbitMQ) and iPaaS platforms (MuleSoft Anypoint MQ, Boomi Atom Queue, Workato, Celigo). The specific ERP system at either end (Salesforce, SAP, Oracle, NetSuite, Dynamics 365, Workday) does not change the poison message handling approach — it changes the error codes and data mapping fixes needed during remediation.
| System | Role | API Surface | Direction |
|---|---|---|---|
| Source ERP (e.g., Salesforce) | Event producer — generates change events or outbound messages | REST, Platform Events, CDC | Outbound |
| Message Broker (e.g., AWS SQS, Kafka) | Message transport + DLQ infrastructure | SQS API, Kafka Protocol | Transport |
| iPaaS (e.g., MuleSoft, Boomi) | Integration orchestrator — message transformation and routing | Anypoint MQ, Atom Queue | Orchestrator |
| Target ERP (e.g., SAP S/4HANA) | Message consumer — processes inbound records | OData, BAPI, IDoc | Inbound |
Poison message handling capabilities vary significantly across platforms. The key differentiators are automatic DLQ routing, DLQ inspection APIs, and native replay/redrive support: [src3, src4, src5]
| Platform | DLQ Type | Auto-Route | Max Delivery Count | Inspection API | Native Replay | DLQ Retention |
|---|---|---|---|---|---|---|
| AWS SQS | Separate queue | Yes (redrive policy) | Configurable (1-1000) | ReceiveMessage on DLQ | Yes (DLQ Redrive API) | Same as source (max 14 days) |
| Azure Service Bus | Sub-queue ($deadletterqueue) | Yes (MaxDeliveryCount) | Default 10, configurable | Peek/receive on sub-queue | Manual (receive + re-send) | Unlimited (Premium) |
| Apache Kafka | Separate topic (DLT) | Application-level | Application-level | Consumer on DLT topic | Application-level | Topic retention config |
| RabbitMQ | Separate queue (x-dead-letter-exchange) | Yes (x-delivery-limit) | Configurable via quorum queues | AMQP consume on DLQ | Manual (consume + re-publish) | Queue TTL config |
| MuleSoft Anypoint MQ | Separate queue | Yes (max delivery attempts) | Configurable | Anypoint MQ API | Yes (REM) | 7 days default |
| Boomi Atom Queue | Built-in DLQ | Yes (after 7 attempts) | 7 (6 retries + original) | Queue Management panel | Yes (resend dead letters) | Atom storage lifecycle |
| Platform | Replay Rate Limit | Concurrent Replays | Max DLQ Size | Notes |
|---|---|---|---|---|
| AWS SQS | System-optimized or custom max velocity | 1 active redrive task per source queue | No hard limit (cost-based) | Redrive task max duration: 36 hours; max 100 active tasks per account [src4] |
| Azure Service Bus | No built-in rate limit on replay | N/A (manual process) | Entity size limit (Premium: 80 GB) | No automatic cleanup — messages persist until explicitly completed [src3] |
| Apache Kafka | Consumer throughput | Consumer group parallelism | Topic retention (size or time) | No native redrive — must implement consumer that reads DLT and produces to main topic [src5] |
| MuleSoft Anypoint MQ | API rate limits apply | Per-queue basis | 120,000 in-flight messages | REM feature provides managed replay with visibility [src6] |
| Boomi | Queue throughput | Per-atom basis | Atom storage capacity | Dead letters visible in Queue Management panel; batch resend available |
| Metric | Target | Alert When |
|---|---|---|
| DLQ ingestion rate | < 1% of incoming throughput | Sustained > 1% for 15 minutes [src1] |
| DLQ backlog (depth) | < 1,000 messages | Growing for > 1 hour without triage [src1] |
| Oldest message age in DLQ | < 24 hours for critical streams | Any message > 24 hours untriaged [src1] |
| Replay success rate | > 95% | Below 90% on any replay batch [src1] |
| Poison ratio (DLQ / total) | < 5% | Above 5% sustained [src1] |
| Time to first triage | < 4h (critical), < 24h (standard) | Exceeding SLA threshold [src1] |
N/A — pattern-level card. Authentication is handled at the broker/iPaaS layer:
| Platform | Auth Method | Notes |
|---|---|---|
| AWS SQS | IAM roles / policies | DLQ access requires sqs:ReceiveMessage + sqs:DeleteMessage + sqs:SendMessage on both source and DLQ |
| Azure Service Bus | SAS or Azure AD (RBAC) | DLQ is a sub-queue — same connection string, append /$deadletterqueue [src3] |
| Apache Kafka | SASL/SCRAM, mTLS, or ACLs | DLT is a regular topic — requires separate ACL for consumer group [src5] |
| MuleSoft | Anypoint Platform credentials | DLQ management requires Manage Queues permission [src6] |
START — Message has failed processing and landed in DLQ
├── Step 1: Classify the failure
│ ├── Transient error? (timeout, 429, 503, network error)
│ │ ├── YES → Should NOT be in DLQ — investigate why retries exhausted
│ │ │ ├── maxDeliveryCount too low? → Increase to 3-5
│ │ │ ├── Backoff delay too short? → Increase max backoff
│ │ │ └── Upstream system down for extended period? → Expected; replay now
│ │ └── Action: REPLAY IMMEDIATELY (system has recovered)
│ ├── Data quality error? (schema violation, missing field, invalid reference)
│ │ ├── Can the message be fixed automatically?
│ │ │ ├── YES → Auto-remediate → REPLAY WITH IDEMPOTENCY CHECK
│ │ │ └── NO → Route to manual review queue
│ │ └── Action: FIX DATA → REPLAY WITH IDEMPOTENCY CHECK
│ ├── Permanent error? (invalid endpoint, auth failure, business rule violation)
│ │ ├── Code/config bug? → Fix, deploy → REPLAY ENTIRE BATCH
│ │ └── Business rule rejection? → Fix target state or DISCARD + ALERT
│ └── Unknown error? → QUARANTINE → MANUAL TRIAGE
├── Step 2: Remediate
│ ├── Automated fix possible? → Apply transform → validate → replay
│ └── Manual fix needed? → Alert ops → ticket → SLA clock starts
├── Step 3: Replay
│ ├── Verify idempotency key present
│ ├── Verify ordering (parent before child)
│ ├── Replay to original queue (NOT directly to consumer)
│ ├── Monitor replay success rate
│ └── If fails again → QUARANTINE (no infinite loop)
└── Step 4: Post-mortem
├── New failure category? → Add classifier rule
├── Recurring pattern? → Fix upstream validation
└── Update monitoring thresholds
| Scenario | Action | Replay? | Idempotency? | Alert Level |
|---|---|---|---|---|
| Schema violation (missing field) | Fix data, validate, replay | Yes | Yes | Warning |
| Invalid foreign key reference | Create parent first, then replay | Yes (ordered) | Yes | Warning |
| Rate limit exhaustion (429) | Should not be in DLQ — increase retry budget | Yes (immediate) | Yes | Info |
| Authentication failure (401/403) | Fix credentials, replay batch | Yes | Yes | Critical |
| Business rule violation | Fix target ERP state or discard | Conditional | Yes | Warning |
| Malformed payload (unparseable) | Discard — cannot be fixed | No | N/A | Error |
| Target system decommissioned | Discard + archive for audit | No | N/A | Critical |
| Duplicate record conflict (409) | Already processed — safe to discard | No | N/A | Info |
| Cascading failure (parent failed) | Fix parent first, replay children in order | Yes (ordered) | Yes | Warning |
| Unknown/unclassified error | Quarantine for investigation | Pending triage | Yes | Error |
Before a message reaches the DLQ, classify the error type in your consumer. This metadata travels with the message and determines the triage path. [src1, src7]
def classify_error(exception, message):
"""Classify processing errors to determine DLQ triage path."""
error_info = {
"error_class": type(exception).__name__,
"error_message": str(exception)[:500],
"timestamp": datetime.utcnow().isoformat(),
"message_id": message.get("message_id"),
"attempt_count": message.get("approximate_receive_count", 0),
}
if isinstance(exception, (TimeoutError, ConnectionError)):
error_info["category"] = "transient"
error_info["retry_eligible"] = True
elif isinstance(exception, (ValidationError, SchemaError)):
error_info["category"] = "data_quality"
error_info["retry_eligible"] = False
elif isinstance(exception, (AuthenticationError, PermissionError)):
error_info["category"] = "permanent"
error_info["retry_eligible"] = False
else:
error_info["category"] = "unknown"
error_info["retry_eligible"] = False
return error_info
Verify: Check DLQ messages have category attribute set → confirms classification is running.
Set up automatic dead-letter routing with appropriate delivery count thresholds. [src3, src4]
# AWS SQS — Create DLQ and attach redrive policy
aws sqs create-queue --queue-name erp-orders-dlq \
--attributes '{"MessageRetentionPeriod":"1209600"}'
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789/erp-orders \
--attributes '{
"RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:123456789:erp-orders-dlq\",\"maxReceiveCount\":\"5\"}"
}'
# Azure Service Bus — Set MaxDeliveryCount (recommend 5 for ERP)
az servicebus queue update \
--resource-group erp-integration \
--namespace-name erp-bus \
--name erp-orders \
--max-delivery-count 5
Verify: Send a message that always fails → confirm it appears in DLQ after 5 attempts.
Create a dedicated consumer that reads from the DLQ, classifies messages, and routes them through the triage workflow. [src1, src7]
import json, boto3
from datetime import datetime
sqs = boto3.client("sqs")
DLQ_URL = "https://sqs.us-east-1.amazonaws.com/123456789/erp-orders-dlq"
SOURCE_URL = "https://sqs.us-east-1.amazonaws.com/123456789/erp-orders"
def triage_dlq_messages(max_messages=10):
"""Read DLQ, classify, and route for remediation or replay."""
response = sqs.receive_message(
QueueUrl=DLQ_URL,
MaxNumberOfMessages=max_messages,
MessageAttributeNames=["All"],
AttributeNames=["All"],
)
for msg in response.get("Messages", []):
error_category = msg.get("MessageAttributes", {}).get(
"error_category", {}).get("StringValue", "unknown")
receive_count = int(msg["Attributes"].get("ApproximateReceiveCount", 0))
if receive_count > 3: # Prevent infinite triage loops
quarantine_message(msg, reason="triage_loop_detected")
continue
if error_category == "transient":
replay_message(msg, json.loads(msg["Body"]), SOURCE_URL)
elif error_category == "data_quality":
attempt_auto_fix(msg, json.loads(msg["Body"]))
elif error_category == "permanent":
route_to_manual_review(msg, json.loads(msg["Body"]))
else:
quarantine_message(msg, reason="unclassified")
Verify: aws sqs get-queue-attributes --queue-url $DLQ_URL --attribute-names ApproximateNumberOfMessages → count decreasing as triage runs.
Replay messages back to the source queue with idempotency verification. [src1, src4]
def replay_message(dlq_msg, body, target_queue_url):
"""Replay a DLQ message with idempotency safety."""
idempotency_key = body.get("idempotency_key")
if not idempotency_key:
quarantine_message(dlq_msg, reason="missing_idempotency_key")
return
if is_already_processed(idempotency_key):
sqs.delete_message(QueueUrl=DLQ_URL, ReceiptHandle=dlq_msg["ReceiptHandle"])
return # already handled
body["_replay"] = {
"replayed_at": datetime.utcnow().isoformat(),
"replay_attempt": body.get("_replay", {}).get("replay_attempt", 0) + 1,
}
if body["_replay"]["replay_attempt"] > 3:
quarantine_message(dlq_msg, reason="max_replay_attempts_exceeded")
return
sqs.send_message(
QueueUrl=target_queue_url,
MessageBody=json.dumps(body),
MessageAttributes={
"idempotency_key": {"DataType": "String", "StringValue": idempotency_key},
"is_replay": {"DataType": "String", "StringValue": "true"},
},
)
sqs.delete_message(QueueUrl=DLQ_URL, ReceiptHandle=dlq_msg["ReceiptHandle"])
Verify: Replay a known-good message → confirm no duplicate in target ERP.
# Input: DLQ queue name, alert threshold, SNS topic ARN
# Output: CloudWatch alarms for DLQ depth and message age
import boto3
cloudwatch = boto3.client("cloudwatch")
def create_dlq_depth_alarm(queue_name, threshold=100, sns_topic_arn=None):
cloudwatch.put_metric_alarm(
AlarmName=f"dlq-depth-{queue_name}",
AlarmDescription=f"DLQ {queue_name} has > {threshold} messages",
Namespace="AWS/SQS",
MetricName="ApproximateNumberOfMessagesVisible",
Dimensions=[{"Name": "QueueName", "Value": queue_name}],
Statistic="Maximum",
Period=300, EvaluationPeriods=2,
Threshold=threshold,
ComparisonOperator="GreaterThanThreshold",
AlarmActions=[sns_topic_arn] if sns_topic_arn else [],
)
def create_dlq_age_alarm(queue_name, max_age_seconds=86400, sns_topic_arn=None):
cloudwatch.put_metric_alarm(
AlarmName=f"dlq-age-{queue_name}",
AlarmDescription=f"DLQ {queue_name} has messages older than {max_age_seconds}s",
Namespace="AWS/SQS",
MetricName="ApproximateAgeOfOldestMessage",
Dimensions=[{"Name": "QueueName", "Value": queue_name}],
Statistic="Maximum",
Period=300, EvaluationPeriods=1,
Threshold=max_age_seconds,
ComparisonOperator="GreaterThanThreshold",
AlarmActions=[sns_topic_arn] if sns_topic_arn else [],
)
// Input: Kafka connection config, DLT topic name
// Output: Triage consumer that classifies and routes failed messages
const { Kafka } = require("kafkajs"); // [email protected]
const kafka = new Kafka({ brokers: ["broker:9092"] });
const consumer = kafka.consumer({ groupId: "dlq-triage" });
const producer = kafka.producer({ idempotent: true });
async function runDLTTriageConsumer(dltTopic, mainTopic) {
await consumer.connect();
await producer.connect();
await consumer.subscribe({ topic: dltTopic, fromBeginning: false });
await consumer.run({
eachMessage: async ({ message }) => {
const headers = message.headers || {};
const errorType = headers["error-type"]?.toString() || "unknown";
const retryCount = parseInt(headers["retry-count"]?.toString() || "0");
const idempotencyKey = headers["idempotency-key"]?.toString();
if (!idempotencyKey) {
await logToQuarantine(message, "missing_idempotency_key");
return;
}
if (retryCount > 3) {
await logToQuarantine(message, "max_retries_exceeded");
return;
}
switch (errorType) {
case "transient":
await producer.send({ topic: mainTopic, messages: [{
key: message.key, value: message.value,
headers: { ...headers, "is-replay": "true",
"retry-count": String(retryCount + 1) },
}] });
break;
case "data_quality":
await routeToRemediationTopic(message);
break;
default:
await logToQuarantine(message, errorType);
await alertOpsTeam(message, errorType);
}
},
});
}
# Input: Service Bus namespace, queue name, SAS token
# Output: Peek at dead-lettered messages for triage
SAS_TOKEN="SharedAccessSignature sr=..."
# Peek messages in DLQ (non-destructive)
curl -X POST \
"https://erp-bus.servicebus.windows.net/erp-orders/\$deadletterqueue/messages/head?timeout=30" \
-H "Authorization: $SAS_TOKEN"
# Complete (delete) a DLQ message after successful triage
curl -X DELETE \
"https://erp-bus.servicebus.windows.net/erp-orders/\$deadletterqueue/messages/{messageId}/{lockToken}" \
-H "Authorization: $SAS_TOKEN"
When a message moves to the DLQ, critical context must be preserved for effective triage and replay:
| Field | Purpose | Required for Replay? | Notes |
|---|---|---|---|
| original_message_id | Trace back to original message | Yes | Idempotency dedup and audit trail |
| idempotency_key | Prevent duplicate processing | Yes | Without this, replay creates duplicates |
| error_category | Triage classification | Yes | Determines triage path |
| error_message | Root cause description | No (helpful) | Truncate to 500 chars |
| source_queue | Original queue/topic | Yes | Required for replay routing |
| original_timestamp | When first produced | Yes | Detect aging and retention deadline |
| attempt_count | Delivery attempt count | Yes | Helps tune retry budget |
| correlation_id | Links related messages | Conditional | Required for ordered replay |
| Platform | Auto-Captured Metadata | Custom Metadata | Access Pattern |
|---|---|---|---|
| AWS SQS | ApproximateReceiveCount, SentTimestamp | MessageAttributes (up to 10) | ReceiveMessage with AttributeNames=All [src4] |
| Azure Service Bus | DeliveryCount, EnqueuedTimeUtc, DeadLetterReason | Custom properties (unlimited) | Peek/receive on $deadletterqueue [src3] |
| Apache Kafka | Offset, partition, timestamp | Headers (key-value byte arrays) | Consumer on DLT topic [src5] |
| MuleSoft Anypoint MQ | deliveryCount, destination | Custom properties | Anypoint MQ API or REM console [src6] |
| Code | Meaning | Source System | Triage Action |
|---|---|---|---|
| 400 | Payload validation failure | Target ERP API | Data quality fix → replay |
| 404 | Referenced record does not exist | Target ERP API | Create parent → replay children |
| 409 | Duplicate record — already exists | Target ERP API | Safe to discard |
| 422 | Business rule violation | Target ERP API | Fix target state → replay |
| INVALID_FIELD | Field not writable | Salesforce API | Update field mapping → replay |
| UNABLE_TO_LOCK_ROW | Record locked | Salesforce API | Transient — increase retry budget |
| GOVERNANCE_LIMIT | SuiteScript governance exhausted | NetSuite | Reduce batch size, replay |
| -ERR_PARSE | Malformed XML/JSON | Any consumer | Permanent — discard + log |
Set DLQ retention to maximum (14 days); monitor ApproximateAgeOfOldestMessage. [src4]Implement idempotency check using message_id or business key — upsert, not insert. [src1]Sort replay batch by correlation_id + sequence_number; replay parents first. [src1]Never assign a DLQ to a DLQ consumer. Log errors, alert, halt processing. [src1, src7]Use velocity-controlled replay — AWS SQS custom redrive velocity or app-level throttle at 50-100 msg/s. [src4]Validate schema compatibility before replay; transform stale messages to current schema. [src7]# BAD — schema violation retries forever, blocks queue, burns compute
def process_message(message):
while True:
try:
call_erp_api(message)
return
except Exception:
time.sleep(5) # never gives up
# GOOD — classify error, retry transient only, DLQ permanent failures
def process_message(message, max_retries=5):
for attempt in range(max_retries):
try:
call_erp_api(message)
return
except TransientError:
delay = min(2 ** attempt + random.uniform(0, 1), 60)
time.sleep(delay)
except (ValidationError, SchemaError) as e:
route_to_dlq(message, category="data_quality", error=str(e))
return
except Exception as e:
route_to_dlq(message, category="permanent", error=str(e))
return
route_to_dlq(message, category="transient_exhausted", error="max retries")
# BAD — failed messages logged and forgotten. Data is lost forever.
def process_message(message):
try:
call_erp_api(message)
except Exception as e:
logger.error(f"Failed: {e}")
acknowledge(message) # message deleted, data lost
# GOOD — failed messages preserved with diagnostic context
def process_message(message):
try:
call_erp_api(message)
except Exception as e:
error_context = classify_error(e, message)
route_to_dlq(message, category=error_context["category"],
error=str(e), correlation_id=message.get("correlation_id"))
acknowledge(message) # now safely in DLQ
# BAD — replay sends to ERP without checking if already processed
def replay_from_dlq(dlq_messages):
for msg in dlq_messages:
call_erp_api(msg) # may create duplicate invoice/order
delete_from_dlq(msg)
# GOOD — check if already processed before replay
def replay_from_dlq(dlq_messages):
for msg in dlq_messages:
idempotency_key = msg.get("idempotency_key")
if is_already_processed(idempotency_key):
delete_from_dlq(msg)
continue
try:
call_erp_api_with_upsert(msg) # upsert, not insert
mark_as_processed(idempotency_key)
delete_from_dlq(msg)
except Exception as e:
quarantine(msg, reason=str(e)) # no infinite loop
SLA-based triage — critical: <4h, standard: <24h. Monitor DLQ depth and age as production metrics. [src1]Use peek/browse operations (Azure peek-lock, SQS visibility timeout, Kafka manual offset). [src3, src4]Velocity-controlled replay — start at 10% normal throughput. AWS SQS Redrive supports custom max velocity. [src4]Tag with correlation_id + sequence_number. Sort replay by correlation_id, then sequence ascending. [src1]Automated classification at DLQ ingestion. Auto-replay transient errors. Route data quality to auto-fix. Escalate novel errors only. [src1, src7]Mirror DLQ config in all environments. Include poison message scenarios in integration tests. [src7]# === AWS SQS DLQ Diagnostics ===
# Check DLQ message count
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789/erp-orders-dlq \
--attribute-names ApproximateNumberOfMessages ApproximateNumberOfMessagesNotVisible
# Check oldest message age (seconds)
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789/erp-orders-dlq \
--attribute-names ApproximateAgeOfOldestMessage
# Initiate DLQ redrive to source queue
aws sqs start-message-move-task \
--source-arn arn:aws:sqs:us-east-1:123456789:erp-orders-dlq \
--destination-arn arn:aws:sqs:us-east-1:123456789:erp-orders \
--max-number-of-messages-per-second 50
# Check redrive task status
aws sqs list-message-move-tasks \
--source-arn arn:aws:sqs:us-east-1:123456789:erp-orders-dlq
# === Azure Service Bus DLQ Diagnostics ===
# Check DLQ message count
az servicebus queue show \
--resource-group erp-integration \
--namespace-name erp-bus \
--name erp-orders \
--query "countDetails.deadLetterMessageCount"
# === Apache Kafka DLT Diagnostics ===
# Check DLT topic consumer lag
kafka-consumer-groups.sh --bootstrap-server broker:9092 \
--describe --group dlq-triage
# === MuleSoft Anypoint MQ Diagnostics ===
curl -X GET "https://anypoint.mulesoft.com/mq/admin/api/v1/organizations/{orgId}/environments/{envId}/regions/{region}/destinations/erp-orders-dlq/stats" \
-H "Authorization: Bearer $ANYPOINT_TOKEN"
| Feature | Release Date | Platform | Breaking Changes | Migration Notes |
|---|---|---|---|---|
| SQS DLQ Redrive API | 2024-06 | AWS SQS | N/A (new feature) | Replaces custom redrive consumers; velocity control |
| Anypoint MQ REM | 2025-01 | MuleSoft | N/A (new feature) | Managed replay — replaces manual consume + re-publish |
| MaxDeliveryCount | GA | Azure Service Bus | N/A | Default 10; recommend 5 for ERP integrations |
| Spring @RetryableTopic + DLT | 2021 | Kafka/Spring | N/A | Auto-creates retry-N and -dlt topics |
| Quorum Queue delivery-limit | 2020 | RabbitMQ 3.8 | Classic queues unsupported | Must migrate to quorum queues |
| Boomi Event Streams DLQ | 2024 | Boomi | N/A | Configurable max retries with exponential backoff |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Messages repeatedly fail and block queue processing | Simple transient failures that resolve with retry + backoff | Error handling & DLQ fundamentals |
| Failed messages must be diagnosed, fixed, and replayed | Fire-and-forget integrations (message loss acceptable) | Simple error logging + monitoring |
| Multi-step flows with parent/child message dependencies | Single API call with synchronous response | Direct API error handling with retry |
| Compliance requires no data loss in integration pipeline | High-throughput streaming where per-message triage is cost-prohibitive | Batch error aggregation + statistical monitoring |
| Multiple failure categories need different remediation | All failures have the same root cause | Single-path retry strategy |
| Capability | AWS SQS | Azure Service Bus | Apache Kafka | MuleSoft Anypoint MQ | Boomi |
|---|---|---|---|---|---|
| DLQ Architecture | Separate queue | Sub-queue ($deadletterqueue) | Separate topic (DLT) | Separate queue | Built-in DLQ |
| Auto Dead-Letter | Yes (redrive policy) | Yes (MaxDeliveryCount) | No (application-level) | Yes | Yes (after 7 attempts) |
| Max Delivery Config | 1-1000 | 1-2000 (default 10) | Application-defined | Configurable | Fixed at 7 |
| Native Replay | Yes (Redrive API) | No (manual) | No (application-level) | Yes (REM) | Yes (resend) |
| DLQ Retention | Max 14 days | Unlimited (Premium) | Topic config | 7 days default | Atom lifecycle |
| DLQ Reason Metadata | Custom attributes | DeadLetterReason header | Custom headers | Custom properties | Limited |
| Non-Destructive Peek | Visibility timeout | Peek-lock | Consumer offset mgmt | API browse | Panel view |
| FIFO Support | FIFO DLQ for FIFO queue | FIFO within sessions | Partition-ordered | FIFO queue | No |
| Monitoring | CloudWatch metrics | Azure Monitor | Consumer group lag | Anypoint Monitoring | Dashboard |