Payment Processing System Design
How do I design a payment processing system?
TL;DR
- Bottom line: A payment system requires an API gateway, idempotent payment service, double-entry ledger, fraud detection, webhook-driven status updates, and reconciliation against PSP settlement files -- with PCI DSS v4.0.1 compliance as a non-negotiable foundation.
- Key tool/command:
stripe.PaymentIntent.create(amount=1000, currency="usd", idempotency_key=unique_key) - Watch out for: Missing idempotency keys on payment mutations -- a single network retry without idempotency can double-charge a customer.
- Works with: Any PSP (Stripe, Adyen, Braintree, Square); any language/framework; PCI DSS v4.0.1 (mandatory since March 2025).
Constraints
- PCI DSS v4.0.1 is mandatory: Any system storing, processing, or transmitting cardholder data must comply. All future-dated v4.0 requirements became mandatory March 31, 2025. Use tokenization to minimize PCI scope. [src3]
- Never store raw card data: Never persist PAN, CVV, or magnetic stripe data in your database. Use a PCI-certified tokenization provider (Stripe Elements, Adyen Drop-in, Braintree Hosted Fields) so card data never touches your servers.
- Integer arithmetic for money: All amounts must be stored and computed as integer minor units (cents, pence). Floating-point arithmetic causes rounding errors that compound across millions of transactions. [src1]
- Idempotency on every mutation: Every charge, refund, capture, and cancel operation must accept a client-generated idempotency key. Without this, network retries cause double charges. [src2]
- Webhooks are the source of truth: Never rely on synchronous API responses alone for payment status. Verify signature on every webhook. Treat
payment_intent.succeededwebhook as the canonical confirmation. [src7] - Double-entry ledger is non-negotiable: Every money movement must create balanced debit/credit entries. The sum of all ledger entries must always equal zero. [src4]
Quick Reference
| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway | Rate limiting, auth, routing, idempotency-key extraction | Kong, AWS API Gateway, Cloudflare, Nginx | Horizontal; stateless |
| Payment Service | Orchestrates payment lifecycle (authorize, capture, refund) | Custom service (Node.js/Go/Java) | Horizontal with sticky sessions per idempotency key |
| Idempotency Store | Deduplicates requests using idempotency keys | Redis (TTL 24-72h), DynamoDB | Redis Cluster; TTL-based eviction |
| PSP Adapter | Abstracts payment gateway APIs (Stripe, Adyen, etc.) | Adapter pattern per PSP | One adapter per PSP; circuit breakers per provider |
| Ledger Service | Double-entry bookkeeping, immutable transaction log | PostgreSQL, CockroachDB, custom append-only | Write-ahead log; partitioned by time |
| Fraud Detection | Rules engine + ML scoring (velocity, device fingerprint) | Stripe Radar, Sift, custom rules engine | Async scoring; circuit breaker on ML service |
| Webhook Processor | Receives and processes PSP webhooks | Queue-backed worker (SQS, Kafka, RabbitMQ) | Horizontal workers; partition by merchant |
| Notification Service | Sends payment confirmations (email, push, in-app) | Event-driven consumer (Kafka/SQS) | Independent consumer scaling |
| Reconciliation Engine | Matches ledger entries against PSP settlement files | Batch job (daily/hourly) | Time-partitioned; parallel per PSP |
| Currency Service | Exchange rates, multi-currency conversion | Open Exchange Rates API, ECB feeds, Redis cache | Cached rates; refresh every 5-60 min |
| Retry Manager | Handles failed payment retries with backoff | Exponential backoff + jitter, dead letter queue | Queue-based; configurable per failure type |
| Circuit Breaker Mesh | Isolates PSP failures (Visa down != Mastercard down) | Resilience4j, Polly, custom per-PSP breakers | Per-PSP thresholds; auto-recovery |
| Audit Trail | Immutable log of all payment events for compliance | Append-only store (S3, event store, audit table) | Append-only; archive to cold storage |
Decision Tree
START
|-- What is your PCI compliance posture?
| |-- Cannot handle any card data -> Use hosted checkout (Stripe Checkout, Adyen Drop-in)
| |-- Can use client-side tokenization -> Use Elements/Hosted Fields + server-side PaymentIntents
| |-- Full PCI Level 1 (SAQ D) -> Direct API integration with raw card data (rare, expensive)
|
|-- Expected transaction volume?
| |-- <1K txn/day -> Single payment service + PostgreSQL ledger + single PSP
| |-- 1K-100K txn/day -> Payment service + Redis idempotency + async webhooks + single PSP
| |-- 100K-1M txn/day -> Microservices + event-driven + multi-PSP with failover + dedicated ledger
| |-- >1M txn/day -> Full payment orchestration platform + sharded ledger + active-active regions
|
|-- Multi-PSP needed?
| |-- Single region, one payment method -> Single PSP is fine
| |-- Multi-region or cost optimization -> Payment orchestration layer (adapter pattern)
| |-- Regulatory requirements per region -> Local acquiring with PSP routing rules
|
|-- Subscription/recurring payments?
| |-- YES -> Add billing service with Stripe Billing / custom dunning logic
| |-- NO -> One-time PaymentIntent flow is sufficient
|
|-- DEFAULT -> Start with single PSP (Stripe), idempotent payment service, PostgreSQL ledger, webhook processor
Step-by-Step Guide
1. Define your PCI scope and choose a tokenization strategy
Minimize PCI scope by using client-side tokenization. Stripe Elements, Adyen Drop-in, or Braintree Hosted Fields collect card data in an iframe -- your servers never see raw card numbers. This reduces your PCI compliance from SAQ D (400+ controls) to SAQ A (22 controls). [src3]
PCI Scope Levels:
- SAQ A: Fully hosted payment page (Checkout links) -> ~22 controls
- SAQ A-EP: Client-side tokenization (Elements/Drop-in) -> ~139 controls
- SAQ D: Direct card data handling (raw API) -> ~400+ controls
Decision: Use SAQ A or SAQ A-EP unless you have a dedicated PCI compliance team.
Verify: Check your Stripe Dashboard > Settings > Compliance to confirm your SAQ level.
2. Design the payment service with idempotency
Create a payment service that accepts idempotency keys on every mutation. Store the key + response in Redis (TTL 24-72h). On duplicate requests, return the cached response. [src2]
Request Flow:
1. Client generates UUID idempotency key
2. API Gateway extracts key, passes to Payment Service
3. Payment Service checks Redis for existing key
- EXISTS -> return cached response (200 OK)
- NOT EXISTS -> set key as "processing" in Redis
4. Call PSP API with idempotency key
5. Store PSP response in Redis with key
6. Write to ledger
7. Return response to client
Verify: Send the same payment request twice with the same idempotency key -- you should get identical responses and only one charge.
3. Implement the double-entry ledger
Every payment creates balanced ledger entries. A $100 charge from customer to merchant creates: debit customer account $100, credit merchant account $100. The sum across all entries is always zero. [src4]
CREATE TABLE ledger_entries (
id BIGSERIAL PRIMARY KEY,
transaction_id UUID NOT NULL,
account_id UUID NOT NULL,
entry_type VARCHAR(6) NOT NULL CHECK (entry_type IN ('debit', 'credit')),
amount_cents BIGINT NOT NULL,
currency CHAR(3) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
metadata JSONB
);
-- Immutability: no UPDATE or DELETE allowed
-- Invariant: SUM(debit) - SUM(credit) = 0
Verify: SELECT SUM(CASE WHEN entry_type='debit' THEN amount_cents ELSE -amount_cents END) FROM ledger_entries; must return 0.
4. Set up webhook processing with signature verification
Never trust a webhook without verifying its signature. Process webhooks asynchronously: acknowledge with 200 immediately, then enqueue for processing. Deduplicate by event ID. [src7]
Webhook Processing Pipeline:
1. Receive POST from PSP
2. Verify signature (HMAC-SHA256 or asymmetric)
3. Respond 200 OK immediately (< 5 seconds)
4. Enqueue event to message queue (SQS/Kafka/RabbitMQ)
5. Worker dequeues event
6. Check event ID in processed_events table (idempotent)
7. Update payment status, ledger, emit notifications
Verify: Replay a webhook event -- your system should process it exactly once.
5. Build reconciliation against settlement files
PSPs send daily settlement files listing all completed transactions and fees. Your reconciliation engine compares these against your ledger to detect discrepancies. [src1]
Reconciliation Process:
1. Download PSP settlement file (CSV/JSON, usually T+1 or T+2)
2. Parse into normalized transaction records
3. For each record: find matching ledger entry by PSP transaction ID
4. Compare amounts (gross, fee, net)
5. Flag mismatches: MATCHED / AMOUNT_MISMATCH / MISSING_IN_LEDGER / MISSING_IN_PSP
6. Auto-resolve known patterns; alert on unresolved discrepancies
Verify: Run reconciliation on a test day -- all transactions should be MATCHED or have documented exceptions.
6. Implement circuit breakers per PSP
When using multiple PSPs, isolate failures so one provider's outage does not bring down the entire payment system. Each PSP gets its own circuit breaker. [src5]
Circuit Breaker States:
- CLOSED (normal): Requests flow through. Track failure rate.
- OPEN (tripped): All requests fail fast. Redirect to fallback PSP.
- HALF-OPEN (test): Allow limited requests to test recovery.
Per-PSP Configuration:
- Stripe: failure_threshold=5%, window=60s, cooldown=30s
- Adyen: failure_threshold=5%, window=60s, cooldown=30s
- Fallback: Always-on backup PSP for critical payments
Verify: Simulate PSP timeout -- traffic should route to fallback within the cooldown period.
Code Examples
Python: Create idempotent payment with Stripe
# Input: order_id, amount in cents, currency
# Output: PaymentIntent object or cached result
import stripe
import uuid
stripe.api_key = "sk_live_..." # Use env var in production
def create_payment(order_id: str, amount_cents: int, currency: str = "usd"):
idempotency_key = f"pay_{order_id}" # Deterministic: same order = same key
try:
intent = stripe.PaymentIntent.create(
amount=amount_cents, # Integer cents, never float
currency=currency,
metadata={"order_id": order_id},
idempotency_key=idempotency_key,
)
return {"status": "ok", "client_secret": intent.client_secret, "id": intent.id}
except stripe.error.CardError as e:
return {"status": "card_error", "message": e.user_message}
except stripe.error.IdempotencyError:
return {"status": "idempotency_conflict", "message": "Retry with new key"}
Node.js: Webhook handler with idempotency
// Input: Stripe webhook POST request
// Output: 200 OK (idempotent -- safe to receive duplicates)
const stripe = require("stripe")("sk_live_...");
const express = require("express");
const app = express();
const processedEvents = new Set(); // Use Redis in production
app.post("/webhooks/stripe",
express.raw({ type: "application/json" }),
async (req, res) => {
const sig = req.headers["stripe-signature"];
let event;
try {
event = stripe.webhooks.constructEvent(req.body, sig, "whsec_...");
} catch (err) {
return res.status(400).send(`Signature verification failed`);
}
res.status(200).json({ received: true }); // ACK immediately
if (processedEvents.has(event.id)) return; // Idempotent
processedEvents.add(event.id);
if (event.type === "payment_intent.succeeded") {
await fulfillOrder(event.data.object.metadata.order_id);
}
}
);
Python: Idempotent payment handler with database deduplication
# Input: payment request with idempotency key
# Output: payment result (same result for same key)
import hashlib, json, uuid
from datetime import datetime, timedelta
def handle_payment(db, redis, request):
idem_key = request.headers.get("Idempotency-Key")
if not idem_key:
return {"error": "Idempotency-Key header required"}, 400
# Check Redis for cached response (fast path)
cached = redis.get(f"idem:{idem_key}")
if cached:
return json.loads(cached), 200
# Acquire lock to prevent concurrent processing of same key
lock = redis.set(f"idem_lock:{idem_key}", "processing", nx=True, ex=30)
if not lock:
return {"error": "Request in progress"}, 409
try:
result = process_payment(db, request.json)
redis.setex(f"idem:{idem_key}", 86400, json.dumps(result)) # Cache 24h
return result, 200
finally:
redis.delete(f"idem_lock:{idem_key}")
Anti-Patterns
Wrong: Using floating-point for money
# BAD -- floating-point causes rounding errors in financial calculations
price = 19.99
tax = price * 0.0825 # 1.649175
total = price + tax # 21.639175
charged = round(total, 2) # 21.64 -- but cumulative rounding diverges at scale
Correct: Integer minor units (cents)
# GOOD -- integer arithmetic is exact for money
price_cents = 1999
tax_cents = 165 # Pre-calculated or use integer math: 1999 * 825 // 10000
total_cents = price_cents + tax_cents # 2164 -- exact, no rounding errors
# Convert to display: f"${total_cents / 100:.2f}" -> "$21.64"
Wrong: No idempotency on payment creation
// BAD -- network retry creates duplicate charge
app.post("/charge", async (req, res) => {
const charge = await stripe.charges.create({
amount: req.body.amount,
currency: "usd",
source: req.body.token,
// No idempotency key -- if client retries, customer is charged twice
});
res.json(charge);
});
Correct: Idempotency key on every mutation
// GOOD -- idempotency key prevents double charges on retry
app.post("/charge", async (req, res) => {
const charge = await stripe.paymentIntents.create(
{ amount: req.body.amount, currency: "usd",
automatic_payment_methods: { enabled: true } },
{ idempotencyKey: req.body.order_id } // Same order = same key
);
res.json(charge);
});
Wrong: Trusting synchronous response as source of truth
# BAD -- relying solely on API response for payment status
intent = stripe.PaymentIntent.create(amount=1000, currency="usd")
if intent.status == "succeeded":
fulfill_order(order_id) # Race condition: status may change
Correct: Webhook as source of truth with signature verification
# GOOD -- webhook is canonical payment confirmation
@app.route("/webhook", methods=["POST"])
def stripe_webhook():
payload = request.data
sig = request.headers.get("Stripe-Signature")
try:
event = stripe.Webhook.construct_event(payload, sig, WEBHOOK_SECRET)
except (ValueError, stripe.error.SignatureVerificationError):
return "Invalid signature", 400
if event["type"] == "payment_intent.succeeded":
fulfill_order(event["data"]["object"]["metadata"]["order_id"])
return "", 200
Wrong: Mutable ledger entries
-- BAD -- updating ledger entries destroys audit trail
UPDATE ledger_entries SET amount_cents = 500 WHERE id = 12345;
-- No trace of what the original amount was or why it changed
Correct: Append-only reversals
-- GOOD -- corrections are new entries that reverse the original
INSERT INTO ledger_entries (transaction_id, account_id, entry_type, amount_cents, currency)
VALUES ('txn_abc_rev', 'customer_1', 'credit', 1000, 'USD'), -- reversal
('txn_abc_rev', 'merchant_1', 'debit', 1000, 'USD'), -- reversal
('txn_abc_v2', 'customer_1', 'debit', 500, 'USD'), -- corrected
('txn_abc_v2', 'merchant_1', 'credit', 500, 'USD'); -- corrected
Common Pitfalls
- Double charges from missing idempotency: Network timeouts cause automatic retries; without idempotency keys each retry creates a new charge. Fix: generate deterministic idempotency keys from order/payment IDs and pass on every payment creation request. [src2]
- Webhook event replay causing duplicate fulfillment: PSPs retry webhooks if they don't receive a 200 response within the timeout window. Fix: store processed event IDs in a
processed_webhookstable and check before processing. Return 200 immediately, process asynchronously. [src7] - Floating-point money arithmetic at scale:
0.1 + 0.2 !== 0.3in IEEE 754. Across millions of transactions, rounding errors accumulate into material discrepancies. Fix: use integer cents/minor units for all storage and computation. [src1] - Shared database across payment microservices: Two services writing to the same payments table creates a distributed monolith. Fix: each service owns its schema; communicate via events/APIs. [src5]
- Ignoring PSP settlement reconciliation: Your ledger and the PSP's records will diverge due to timing, failed webhooks, or edge cases. Fix: run daily reconciliation against PSP settlement files and alert on mismatches above a threshold. [src4]
- Synchronous cascade failures: Payment service calls fraud service calls ledger service -- if any is slow, the entire chain blocks. Fix: use async event-driven architecture; apply circuit breakers and timeouts on all synchronous calls. [src6]
- Storing raw card numbers: Dramatically increases PCI scope (SAQ D, 400+ controls). Fix: use PSP-provided customer/payment method tokens for repeat charges. [src3]
- No circuit breaker per PSP: When one processor is slow, all transactions fail because they share the same circuit breaker. Fix: implement per-PSP circuit breakers with independent thresholds and fallback routing. [src5]
Diagnostic Commands
# Check Stripe API connectivity and authentication
curl -s https://api.stripe.com/v1/balance -u sk_test_...: | python3 -m json.tool
# Verify webhook signature configuration
stripe listen --forward-to localhost:4242/webhook --log-level debug
# Test idempotency -- second request should return same response
IDEM_KEY=$(uuidgen)
curl -X POST https://api.stripe.com/v1/payment_intents \
-u sk_test_...: \
-H "Idempotency-Key: $IDEM_KEY" \
-d amount=1000 -d currency=usd
# Verify ledger balance (must always be zero)
psql -c "SELECT SUM(CASE WHEN entry_type='debit' THEN amount_cents ELSE -amount_cents END) AS balance FROM ledger_entries;"
# Check for orphaned transactions (in ledger but not in PSP)
psql -c "SELECT transaction_id FROM ledger_entries WHERE transaction_id NOT IN (SELECT psp_transaction_id FROM psp_settlements WHERE settlement_date = CURRENT_DATE - 1);"
# Monitor payment service circuit breaker states
curl -s http://localhost:8080/actuator/circuitbreakers | python3 -m json.tool
Version History & Compatibility
| Standard/API | Status | Breaking Changes | Migration Notes |
|---|---|---|---|
| PCI DSS v4.0.1 | Current (mandatory since 2025-03-31) | MFA for all CDE access; script controls on payment pages; continuous compliance | All future-dated requirements now mandatory |
| PCI DSS v3.2.1 | Retired (2024-03-31) | -- | Must upgrade to v4.0.1 |
| Stripe PaymentIntents API | Current (recommended) | Replaces Charges API | Use payment_intents.create() instead of charges.create() |
| Stripe Charges API | Legacy (still functional) | -- | Migrate to PaymentIntents for SCA/3DS2 support |
| PSD2/SCA (EU) | Mandatory | 3D Secure 2 required for EU card payments | PaymentIntents API handles SCA automatically |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a marketplace or SaaS with custom payment flows | Selling <100 items with simple checkout | Stripe Checkout / Shopify / hosted payment page |
| Need multi-PSP failover for reliability | Single-PSP integration meets all requirements | PSP's official SDK and hosted UI |
| Processing >10K transactions/day with custom business logic | Internal tool with no real money movement | Simple database transactions without payment infrastructure |
| Regulatory requirements demand payment data isolation | Prototyping or MVP with no real payments | Stripe test mode with minimal architecture |
| Need custom reconciliation, reporting, or settlement logic | Standard e-commerce with out-of-box solutions | WooCommerce, Shopify, or platform-native payments |
Important Caveats
- PCI DSS v4.0.1 compliance is not optional: As of March 31, 2025, all requirements are mandatory. Non-compliance can result in fines of $5K-$100K/month and loss of card processing ability. Tokenization dramatically reduces scope but does not eliminate compliance requirements entirely.
- Regional payment regulations vary significantly: PSD2/SCA in the EU, RBI tokenization mandates in India, PIX instant payments in Brazil -- a "global" payment system still needs regional adaptation.
- Settlement timing differs by PSP and region: Stripe settles in T+2 (US), T+7 (new accounts), or T+1 (custom). Reconciliation must account for these timing differences.
- Refund windows vary: Most card networks allow refunds up to 120 days. Some regions mandate longer. Your ledger must handle refunds on transactions that may have already been reconciled and settled.
- Currency conversion is not deterministic: Exchange rates change continuously. Lock rates at payment creation, store the locked rate, and use it consistently for the transaction lifecycle.