Scalable E-Commerce Platform Architecture

Type: Software Reference Confidence: 0.91 Sources: 7 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

ComponentRoleTechnology OptionsScaling Strategy
API GatewayRoute requests, rate limit, auth, SSL terminationKong, AWS API Gateway, NGINX, EnvoyHorizontal — stateless, add instances behind LB
Product Catalog ServiceCRUD products, categories, attributes, pricingNode.js/Python + PostgreSQL or MongoDBRead replicas + CDN cache; write sharding by category
Search ServiceFull-text search, faceted filtering, autocompleteElasticsearch, OpenSearch, TypesenseHorizontal sharding by index; read replicas
User/Auth ServiceRegistration, login, JWT/OAuth, profilesNode.js/Go + PostgreSQL + Redis (sessions)Horizontal — stateless with token-based auth
Cart ServiceAdd/remove items, persist cart state, price calcNode.js/Python + Redis (primary) + PostgreSQL (backup)Horizontal — partition by user ID in Redis Cluster
Order ServiceOrder creation, lifecycle management, historyPython/Java + PostgreSQL (ACID)Shard by order ID; archive old orders to cold storage
Payment ServiceGateway integration, tokenization, refundsNode.js/Go + PostgreSQL + external gateway (Stripe)Horizontal — idempotency keys prevent duplicates
Inventory ServiceStock levels, reservations, warehouse syncGo/Java + PostgreSQL + Redis (hot counts)Optimistic locking; shard by SKU range
Notification ServiceEmail, SMS, push notifications, webhooksNode.js/Python + queue consumer (SQS/RabbitMQ)Scale consumers independently based on queue depth
Recommendation ServicePersonalized suggestions, "also bought"Python (ML) + Redis (feature store) + SparkPrecompute offline; serve from cache; scale reads
CDN / EdgeStatic assets, image delivery, edge cachingCloudFront, Cloudflare, FastlyAutomatic — scales with traffic globally
Message BrokerAsync inter-service events, order saga coordinationKafka, RabbitMQ, AWS SQS/SNSKafka: add partitions; RabbitMQ: add consumers
Monitoring & ObservabilityDistributed tracing, metrics, alerting, loggingDatadog, Grafana+Prometheus, Jaeger, ELK StackScale collectors; sample traces at high volume

Decision Tree

START
├── Expected daily orders < 100 and products < 1K?
│   ├── YES → Use managed platform (Shopify/WooCommerce)
│   └── NO ↓
├── Team size < 5 backend engineers?
│   ├── YES → Modular monolith (single deploy, domain modules, shared DB with schema separation)
│   └── NO ↓
├── < 1K concurrent users?
│   ├── YES → Modular monolith with clear domain boundaries, prepare for future extraction
│   └── NO ↓
├── 1K–50K concurrent users?
│   ├── YES → Extract high-load services first (search, catalog, cart) as microservices
│   └── NO ↓
├── 50K–500K concurrent users?
│   ├── YES → Full microservices with Kafka event bus, database-per-service, Kubernetes
│   └── NO ↓
├── > 500K concurrent users?
│   ├── YES → Microservices + CQRS/Event Sourcing, multi-region, database sharding
│   └── NO ↓
└── DEFAULT → Start with modular monolith, extract services as bottlenecks emerge

Step-by-Step Guide

1. Define bounded contexts and data ownership

Map your e-commerce domain into distinct bounded contexts using Domain-Driven Design (DDD). Each context becomes a service boundary with its own database. The critical contexts are: Product Catalog, Shopping Cart, Order Management, Payment, Inventory, User/Auth, Search, and Notifications. [src3]

Bounded Contexts:
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Catalog    │  │    Cart      │  │    Order     │
│  (MongoDB)   │  │   (Redis)    │  │ (PostgreSQL) │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       └────────── API Gateway ────────────┘
                         │
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Payment    │  │  Inventory   │  │   Search     │
│ (PostgreSQL) │  │ (PostgreSQL) │  │(Elasticsearch│
└──────────────┘  └──────────────┘  └──────────────┘

Verify: Each service can be deployed and tested independently — no compile-time dependencies between services.

2. Design the API gateway and routing layer

Place an API gateway in front of all services to handle authentication, rate limiting, request routing, and SSL termination. Use path-based routing. [src1]

# Kong or AWS API Gateway route config (conceptual)
routes:
  - path: /api/v1/products
    service: catalog-service
    methods: [GET]
    plugins: [rate-limit, jwt-auth, response-cache]
  - path: /api/v1/cart
    service: cart-service
    methods: [GET, POST, PUT, DELETE]
    plugins: [rate-limit, jwt-auth]
  - path: /api/v1/orders
    service: order-service
    methods: [GET, POST]
    plugins: [rate-limit, jwt-auth]
  - path: /api/v1/checkout
    service: payment-service
    methods: [POST]
    plugins: [rate-limit, jwt-auth, idempotency]

Verify: curl -H "Authorization: Bearer <token>" https://api.example.com/api/v1/products → returns product list with 200 OK.

3. Implement the product catalog with search indexing

The catalog service stores products in a primary database and syncs changes to Elasticsearch for full-text search. Use Change Data Capture (CDC) or event publishing to keep the search index in sync. [src6]

# catalog_service/events.py — Publish product changes to message broker
import json
from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers=["kafka:9092"],
    value_serializer=lambda v: json.dumps(v).encode("utf-8"),
)

def publish_product_event(event_type: str, product: dict):
    producer.send("product-events", value={"event": event_type, "product": product})
    producer.flush()

Verify: Create a product via API → curl localhost:9200/products/_search?q=<name> → product appears within 2 seconds.

4. Build the cart service with Redis persistence

Use Redis as the primary store for shopping carts with sub-millisecond reads and built-in TTL for cart expiry. Back up cart data to PostgreSQL for carts older than 30 minutes. [src2]

# cart_service/cart.py — Redis-backed cart
import redis

r = redis.Redis(host="redis", port=6379, db=0, decode_responses=True)
CART_TTL = 86400 * 7  # 7 days

def add_to_cart(user_id: str, product_id: str, quantity: int):
    cart_key = f"cart:{user_id}"
    r.hset(cart_key, product_id, quantity)
    r.expire(cart_key, CART_TTL)

def get_cart(user_id: str) -> dict:
    return {pid: int(qty) for pid, qty in r.hgetall(f"cart:{user_id}").items()}

Verify: add_to_cart("user123", "SKU-001", 2) then get_cart("user123"){"SKU-001": 2}.

5. Implement checkout with the Saga pattern

The checkout flow spans multiple services and cannot use a single database transaction. Use the Saga pattern: reserve inventory → process payment → create order. If payment fails, release the inventory reservation. [src3]

Checkout Saga Flow:
1. Cart Service     → Validate cart items and prices
2. Inventory Svc    → Reserve stock (soft lock with TTL)
3. Payment Service  → Charge customer via gateway
   ├── SUCCESS → 4. Order Service → Create order record
   │                5. Inventory Svc → Confirm reservation
   │                6. Cart Service → Clear cart
   │                7. Notification → Send confirmation email
   └── FAILURE → Compensate:
                    - Inventory Svc → Release reservation
                    - Notification → Send failure notice

Verify: Place test order → inventory decremented, payment captured, order record exists, cart cleared. Simulate payment failure → inventory restored.

6. Set up event-driven communication

Use Apache Kafka as the central event bus. Services publish domain events (OrderCreated, PaymentProcessed, InventoryReserved) and other services subscribe to react asynchronously. [src2]

# order_service/events.py — Consume payment events
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    "payment-events",
    bootstrap_servers=["kafka:9092"],
    group_id="order-service",
    value_deserializer=lambda m: json.loads(m.decode("utf-8")),
)

for message in consumer:
    event = message.value
    if event["type"] == "PaymentSucceeded":
        create_order(event["order_id"], event["items"], event["total"])
        consumer.commit()
    elif event["type"] == "PaymentFailed":
        release_inventory(event["order_id"], event["items"])
        consumer.commit()

Verify: Publish a PaymentSucceeded event → order record appears in database within 5 seconds.

7. Deploy with container orchestration

Package each service as a Docker container and orchestrate with Kubernetes. Use Horizontal Pod Autoscalers (HPA) to scale based on CPU/memory or custom metrics. [src4]

# k8s/catalog-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: catalog-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: catalog-service
  template:
    spec:
      containers:
      - name: catalog
        image: ecommerce/catalog-service:1.0.0
        ports:
        - containerPort: 8080
        resources:
          requests: { memory: "256Mi", cpu: "250m" }
          limits: { memory: "512Mi", cpu: "500m" }

Verify: kubectl get hpa → shows catalog-hpa. Under load: kubectl get pods -l app=catalog-service → pod count increases.

Code Examples

Python: Order Service with Saga Orchestrator

# order_service/saga.py — Checkout saga orchestrator
# Input:  Cart contents (user_id, items), payment method token
# Output: Order confirmation or compensated failure

import httpx
import uuid

INVENTORY_URL = "http://inventory-service:8080"
PAYMENT_URL = "http://payment-service:8080"

async def checkout_saga(user_id: str, items: list, payment_token: str):
    saga_id = str(uuid.uuid4())
    # Step 1: Reserve inventory
    res = await httpx.AsyncClient().post(
        f"{INVENTORY_URL}/reserve", json={"saga_id": saga_id, "items": items})
    if res.status_code != 200:
        return {"status": "failed", "reason": "inventory_unavailable"}
    # Step 2: Process payment
    res = await httpx.AsyncClient().post(f"{PAYMENT_URL}/charge", json={
        "saga_id": saga_id, "token": payment_token,
        "amount": sum(i["price"] * i["qty"] for i in items)})
    if res.status_code != 200:
        await httpx.AsyncClient().post(
            f"{INVENTORY_URL}/release", json={"saga_id": saga_id})
        return {"status": "failed", "reason": "payment_declined"}
    return {"status": "confirmed", "order_id": res.json()["order_id"]}

Node.js: Inventory Service with Optimistic Locking

// inventory_service/reserve.js — Atomic inventory reservation
// Input:  saga_id, items [{sku, qty}]
// Output: reservation confirmation or rejection

const { Pool } = require("pg");  // [email protected]
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

async function reserveInventory(sagaId, items) {
  const client = await pool.connect();
  try {
    await client.query("BEGIN");
    for (const item of items) {
      const result = await client.query(
        `UPDATE inventory SET reserved = reserved + $1, updated_at = NOW()
         WHERE sku = $2 AND (stock - reserved) >= $1 RETURNING sku`,
        [item.qty, item.sku]);
      if (result.rowCount === 0) {
        await client.query("ROLLBACK");
        return { success: false, reason: `Insufficient stock: ${item.sku}` };
      }
    }
    await client.query(
      `INSERT INTO reservations (saga_id, items, status) VALUES ($1, $2, 'reserved')`,
      [sagaId, JSON.stringify(items)]);
    await client.query("COMMIT");
    return { success: true, saga_id: sagaId };
  } catch (err) { await client.query("ROLLBACK"); throw err; }
  finally { client.release(); }
}

Anti-Patterns

Wrong: Shared database across services

// BAD — All services read/write the same database
// Creates coupling: schema changes break all services,
// impossible to scale services independently,
// single point of failure

Correct: Database-per-service with event sync

// GOOD — Each service owns its data, syncs via events
// Catalog (MongoDB), Order (PostgreSQL), Inventory (PostgreSQL)
// Connected via Kafka Event Bus
// Independent scaling, deployment, and schema evolution

Wrong: Synchronous checkout chain

# BAD — Blocking HTTP calls chain during checkout
def checkout(cart):
    inventory = requests.post("/inventory/reserve", json=cart)  # blocks
    payment = requests.post("/payment/charge", json=cart)       # blocks
    order = requests.post("/orders/create", json=cart)           # blocks
    # If payment service is slow, entire checkout hangs.
    # If order service fails after payment, no compensation.
    return order

Correct: Saga pattern with async compensation

# GOOD — Saga orchestrator with compensation on failure
async def checkout_saga(cart, payment_token):
    saga_id = uuid.uuid4()
    try:
        await reserve_inventory(saga_id, cart.items)
        payment = await process_payment(saga_id, payment_token, cart.total)
        order = await create_order(saga_id, cart, payment.id)
        return {"status": "confirmed", "order_id": order.id}
    except PaymentFailedError:
        await release_inventory(saga_id)  # compensating transaction
        return {"status": "failed", "reason": "payment_declined"}

Wrong: Client-side only cart storage

// BAD — Cart only in browser localStorage
localStorage.setItem("cart", JSON.stringify(cartItems));
// Lost on device switch, browser clear, or incognito.
// No server validation of prices. No abandoned cart analytics.

Correct: Server-side cart with client cache

// GOOD — Server-side cart (Redis) with client-side sync
async function addToCart(productId, qty) {
  const res = await fetch("/api/cart", {
    method: "POST",
    body: JSON.stringify({ product_id: productId, qty }),
    headers: { "Authorization": `Bearer ${token}` },
  });
  const cart = await res.json();
  sessionStorage.setItem("cart_cache", JSON.stringify(cart));
  return cart;
}

Common Pitfalls

Diagnostic Commands

# Check service health across all microservices
for svc in catalog cart order payment inventory search; do
  curl -s "http://${svc}-service:8080/health" | jq '.status'
done

# Monitor Kafka consumer lag (detect processing bottlenecks)
kafka-consumer-groups.sh --bootstrap-server kafka:9092 \
  --describe --group order-service

# Check PostgreSQL active connections and locks
psql -c "SELECT pid, state, query FROM pg_stat_activity WHERE state != 'idle';"

# Redis memory and cart key count
redis-cli INFO memory | grep used_memory_human
redis-cli DBSIZE

# Elasticsearch cluster health
curl -s localhost:9200/_cluster/health | jq '.status,.active_shards'

# Kubernetes pod status
kubectl get pods -n ecommerce -o wide
kubectl get events -n ecommerce --sort-by='.lastTimestamp' | tail -20

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Building a custom e-commerce platform with >1K daily ordersSelling <100 products with simple needsShopify, WooCommerce, or BigCommerce
Team has 5+ backend engineers and DevOps capabilitySolo developer or small team without Kubernetes experienceModular monolith or managed platform
Need independent scaling of catalog, search, and checkoutAll components have similar load patternsModular monolith with domain modules
Regulatory requirements demand service isolation (PCI scope reduction)No compliance requirements and simple payment flowMonolith with Stripe Checkout
Multi-region deployment required for <100ms latency globallySingle-region audience with acceptable latencySingle-region deployment with CDN
Flash sales or highly variable traffic patternsSteady, predictable traffic with no spikesFixed-size deployment with load balancer

Important Caveats

Related Units