design scalable chat system

- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.

WhatsApp system design architecture

- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.

Slack clone system design

- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.

real-time messaging system design

- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.

chat application architecture at scale

- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.

Designing a Scalable Chat Application (WhatsApp/Slack Clone)

How do I design a scalable chat application (WhatsApp/Slack clone)?

TL;DR

Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.
Key tool/command: WebSocket + Kafka + Cassandra + Redis Pub/Sub
Watch out for: Storing messages in a relational database without partitioning — PostgreSQL works for <10M messages/day, but unsharded SQL becomes a bottleneck at scale.
Works with: Any language/framework. Production stacks: Erlang/Elixir (WhatsApp/Discord), Java (Slack), Go (large-scale custom), Node.js (startups/MVPs).

Constraints

WebSocket connections are stateful — never route the same user to different gateway servers mid-session without a reconnection protocol
Message ordering must be guaranteed per-conversation — use a monotonic sequence ID per chat, not wall-clock timestamps
End-to-end encryption (Signal Protocol) means the server cannot read message content — all search/indexing must happen client-side
Database choice is load-dependent: PostgreSQL for <10M messages/day, Cassandra/ScyllaDB for >100M messages/day
Never fan out writes to all group members synchronously — use async message queues (Kafka/NATS) to decouple send from deliver
Group messages create O(N) fanout per message — cap group size or use tiered delivery (online members via WebSocket, offline via push queue)

Quick Reference

Component	Role	Technology Options	Scaling Strategy
API Gateway / Load Balancer	Route client connections, TLS termination, rate limiting	Nginx, HAProxy, AWS ALB, Envoy	Horizontal — add more LB instances behind DNS round-robin
WebSocket Gateway	Maintain persistent bidirectional connections with clients	Custom (Elixir/Erlang, Go, Node.js), Socket.IO	Horizontal — shard by user_id; each server handles 100K-1M connections
Chat Service	Message validation, routing, conversation logic	Java, Go, Node.js, Elixir	Stateless — scale horizontally behind gateway
Message Queue	Decouple send from deliver, buffer during spikes, guarantee delivery	Apache Kafka, NATS JetStream, RabbitMQ, Amazon SQS	Partition by conversation_id; add partitions for throughput
Message Storage	Persist all messages durably	Cassandra, ScyllaDB, PostgreSQL (small scale), TiDB	Partition by (conversation_id, time_bucket); add nodes for capacity
User/Group Metadata DB	User profiles, contacts, group membership, settings	PostgreSQL, MySQL, CockroachDB	Read replicas + connection pooling; shard at extreme scale
Presence Service	Track online/offline/typing status per user	Redis (TTL keys + Pub/Sub), custom Erlang process	Redis Cluster with key-based sharding; TTL for auto-expiry
Media Storage	Store images, videos, voice messages, file attachments	AWS S3, Google Cloud Storage, Cloudflare R2	Object storage is inherently scalable; use CDN for delivery
Push Notification Service	Deliver messages to offline/backgrounded users	FCM (Android), APNs (iOS), custom push gateway	Queue-based — consume from message queue for offline users
Search / Indexing	Full-text search across message history	Elasticsearch, OpenSearch, Meilisearch	Shard by user_id or workspace_id; index asynchronously
Cache Layer	Hot data: recent messages, session data, user profiles	Redis, Memcached	Redis Cluster; cache aside pattern with TTL
CDN	Deliver static assets and media globally	Cloudflare, CloudFront, Fastly	Edge-based; auto-scales
Encryption Service	Key exchange, E2E encryption (Signal Protocol)	libsignal, custom X3DH + Double Ratchet	Stateless — keys stored client-side; server only brokers key exchange

Decision Tree

START — How many concurrent users?
├── <1K concurrent users?
│   ├── YES → SIMPLE STACK
│   │   ├── Single Node.js/Go server with WebSocket (Socket.IO or ws)
│   │   ├── PostgreSQL for messages + users
│   │   ├── In-memory presence (Map/Set on the server)
│   │   └── No message queue needed
│   └── NO ↓
├── 1K–100K concurrent users?
│   ├── YES → DISTRIBUTED STACK
│   │   ├── 2-10 WebSocket gateways behind load balancer
│   │   ├── Redis Pub/Sub for cross-server message routing
│   │   ├── PostgreSQL (partitioned) or MongoDB
│   │   ├── Kafka or RabbitMQ for async delivery
│   │   └── S3 + CDN for media
│   └── NO ↓
├── 100K–10M concurrent users?
│   ├── YES → LARGE-SCALE STACK
│   │   ├── 50-500 WebSocket gateways, sharded by user_id
│   │   ├── Kafka for all message routing
│   │   ├── Cassandra/ScyllaDB for message storage
│   │   ├── Redis Cluster for presence + cache
│   │   └── Elasticsearch for search
│   └── NO ↓
└── >10M concurrent users (WhatsApp/Discord scale)?
    ├── YES → HYPERSCALE STACK
    │   ├── Custom WebSocket layer (Erlang/BEAM or Rust)
    │   ├── Per-user message queues
    │   ├── ScyllaDB or custom storage
    │   ├── Multi-region deployment with geo-routing
    │   ├── Signal Protocol E2E encryption
    │   └── Dedicated teams per subsystem (>50 engineers)
    └── DEFAULT → Start with DISTRIBUTED STACK, evolve as needed

Step-by-Step Guide

1. Define message data model and API contract

Design the core message schema before writing any code. Every message needs a globally unique ID, a conversation reference, sender, timestamp, and content. Use a snowflake-style ID or ULID for sortable unique IDs. [src4]

-- Core message schema (PostgreSQL for small scale)
CREATE TABLE messages (
    message_id    BIGINT PRIMARY KEY,  -- Snowflake ID
    conversation_id BIGINT NOT NULL,
    sender_id     BIGINT NOT NULL,
    content       TEXT,
    content_type  VARCHAR(20) DEFAULT 'text',
    media_url     TEXT,
    created_at    TIMESTAMP NOT NULL DEFAULT NOW(),
    status        VARCHAR(10) DEFAULT 'sent',
    sequence_num  BIGINT NOT NULL
);

CREATE INDEX idx_messages_conv_seq
  ON messages (conversation_id, sequence_num DESC);

Verify: Schema supports both 1:1 and group conversations with the same table structure.

2. Build the WebSocket gateway layer

The gateway maintains persistent WebSocket connections. Use consistent hashing on user_id to route reconnections to the same server. Each server handles 100K-1M connections. [src2]

                ┌─────────────────────────────────────────┐
                │           Load Balancer (L4/L7)         │
                │    (sticky sessions by user_id hash)    │
                └──────────┬──────────┬──────────┬────────┘
                           │          │          │
                ┌──────────▼┐  ┌──────▼──────┐  ┌▼─────────┐
                │ Gateway-1  │  │  Gateway-2  │  │ Gateway-3 │
                │ 500K conns │  │  500K conns │  │ 500K conns│
                └──────┬─────┘  └──────┬──────┘  └─────┬─────┘
                       │               │               │
                ┌──────▼───────────────▼───────────────▼──────┐
                │          Redis Pub/Sub Cluster               │
                └──────────────────┬──────────────────────────┘
                                   │
                ┌──────────────────▼──────────────────────────┐
                │              Kafka Cluster                   │
                └─────────────────────────────────────────────┘

Verify: Load test with artillery — each gateway should handle 100K+ concurrent WebSocket connections on a 16GB RAM instance.

3. Implement message routing and delivery

When User A sends a message: Gateway receives → Chat Service validates → Kafka queue → Consumer delivers to User B's gateway (or push notification if offline). [src3]

Message Flow (1:1):

User A     Gateway-1     Chat Service     Kafka     Gateway-2     User B
  │            │               │            │            │           │
  │── send ──>│               │            │            │           │
  │            │── validate ─>│            │            │           │
  │            │<── ack ──────│            │            │           │
  │<── ack ───│               │            │            │           │
  │            │               │── produce >│            │           │
  │            │               │            │── consume >│           │
  │            │               │            │            │── deliver >│
  │            │               │            │            │<─ delivered│
  │<── delivered (via Redis Pub/Sub) ──────────────────│           │

Verify: Send a message between two users on different gateways — arrival in <500ms.

4. Set up message storage with write optimization

For high-volume messaging (>100M msg/day), use Cassandra/ScyllaDB partitioned by (conversation_id, time_bucket) for O(1) writes and efficient range reads. Discord migrated from Cassandra to ScyllaDB in 2022 for 5x performance improvement. [src1]

-- Cassandra/ScyllaDB schema
CREATE TABLE chat.messages (
    conversation_id bigint,
    bucket          text,       -- '2026-02'
    message_id      bigint,
    sender_id       bigint,
    content         blob,
    content_type    text,
    media_url       text,
    created_at      timestamp,
    PRIMARY KEY ((conversation_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Verify: Cassandra handles >100K writes/sec per node. Read recent 50 messages in <10ms.

5. Implement presence and typing indicators

Use Redis TTL keys for presence. Each connected user has a key expiring after 30s. Clients send heartbeats every 15s. Typing indicators are ephemeral — never persist them. [src6]

Presence Architecture:

Client heartbeat (every 15s)
        │
        ▼
┌─────────────────┐     ┌─────────────────┐
│  Gateway Server  │────>│  Redis Cluster   │
└─────────────────┘     │  presence:{uid}  │
                         │  TTL = 30s       │
                         └────────┬────────┘
                                  │ Pub/Sub
                    ┌─────────────┼──────────────┐
                    │             │              │
               Gateway-1    Gateway-2      Gateway-3

Verify: Set presence key with 30s TTL, wait 31s, confirm key expired. Typing indicator latency <200ms.

6. Add push notifications for offline delivery

When the consumer detects the recipient is offline (no presence key in Redis), route to push notification service. Use FCM for Android and APNs for iOS. Batch notifications to avoid spamming. [src7]

Verify: Disconnect a user, send a message, verify push arrives within 3 seconds. On reconnect, all missed messages sync.

7. Design media handling pipeline

Never send media through WebSocket or message queue. Use pre-signed URL upload: client uploads directly to S3/R2, then sends a message with the media reference (<1KB). [src3]

Verify: Upload 10MB image, accessible via CDN in <2 seconds. Message with media reference is <1KB.

Code Examples

Node.js: WebSocket Chat Gateway Server

// Input:  WebSocket connections from authenticated clients
// Output: Real-time message delivery between connected users
// Deps:   npm install ws ioredis uuid ([email protected], [email protected])

const WebSocket = require('ws');
const Redis = require('ioredis');

const HEARTBEAT_INTERVAL = 15000;
const PRESENCE_TTL = 30;

const redisPub = new Redis(process.env.REDIS_URL);
const redisSub = new Redis(process.env.REDIS_URL);
const redisStore = new Redis(process.env.REDIS_URL);

const wss = new WebSocket.Server({ port: 8080 });
const clients = new Map(); // user_id -> WebSocket

wss.on('connection', (ws, req) => {
  const userId = authenticateFromToken(req);
  if (!userId) { ws.close(4001, 'Unauthorized'); return; }

  clients.set(userId, ws);
  refreshPresence(userId);
  redisSub.subscribe(`deliver:${userId}`);

  ws.on('message', async (data) => {
    const msg = JSON.parse(data);
    if (msg.type === 'heartbeat') { refreshPresence(userId); return; }
    if (msg.type === 'send_message') {
      const messageId = generateSnowflakeId();
      const envelope = {
        message_id: messageId,
        conversation_id: msg.conversation_id,
        sender_id: userId,
        content: msg.content,
        content_type: msg.content_type || 'text',
        created_at: new Date().toISOString(),
      };
      ws.send(JSON.stringify({ type: 'ack', message_id: messageId }));
      await redisPub.publish(
        `conversation:${msg.conversation_id}`,
        JSON.stringify(envelope)
      );
    }
  });

  ws.on('close', () => {
    clients.delete(userId);
    redisStore.del(`presence:${userId}`);
    redisSub.unsubscribe(`deliver:${userId}`);
  });
});

redisSub.on('message', (channel, message) => {
  const [prefix, targetUserId] = channel.split(':');
  if (prefix === 'deliver' && clients.has(targetUserId)) {
    const ws = clients.get(targetUserId);
    if (ws.readyState === WebSocket.OPEN) ws.send(message);
  }
});

async function refreshPresence(userId) {
  await redisStore.setex(`presence:${userId}`, PRESENCE_TTL, Date.now());
}

function generateSnowflakeId() {
  return Date.now() * 1000 + Math.floor(Math.random() * 1000);
}

console.log('WebSocket gateway running on :8080');

Python: Kafka Message Consumer with Delivery Logic

# Input:  Kafka messages from chat service
# Output: Routes to online users (Redis) or push notifications (offline)
# Deps:   pip install confluent-kafka==2.3.0 redis==5.0.0 firebase-admin==6.4.0

import json
from confluent_kafka import Consumer, KafkaError
import redis
import firebase_admin
from firebase_admin import messaging

redis_client = redis.Redis.from_url("redis://localhost:6379", decode_responses=True)
firebase_admin.initialize_app()

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'message-delivery-group',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,
})
consumer.subscribe(['chat.messages'])

def is_user_online(user_id: str) -> bool:
    return redis_client.exists(f"presence:{user_id}") > 0

def deliver_to_online_user(user_id: str, message: dict):
    redis_client.publish(f"deliver:{user_id}", json.dumps(message))

def send_push_notification(user_id: str, message: dict):
    token = redis_client.get(f"device_token:{user_id}")
    if not token:
        return
    notification = messaging.Message(
        token=token,
        notification=messaging.Notification(
            title=f"New message from {message.get('sender_name', 'Someone')}",
            body=message.get('content', '')[:100],
        ),
        data={'conversation_id': str(message['conversation_id'])},
    )
    messaging.send(notification)

def process_message(raw_message: str):
    message = json.loads(raw_message)
    sender_id = str(message['sender_id'])
    members = redis_client.smembers(f"conv_members:{message['conversation_id']}")
    for member_id in members:
        if member_id == sender_id:
            continue
        if is_user_online(member_id):
            deliver_to_online_user(member_id, message)
        else:
            send_push_notification(member_id, message)

try:
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None or msg.error():
            continue
        process_message(msg.value().decode('utf-8'))
        consumer.commit(msg)
except KeyboardInterrupt:
    pass
finally:
    consumer.close()

Anti-Patterns

Wrong: Polling for new messages via HTTP

// ❌ BAD — HTTP polling wastes bandwidth and adds latency.
// At 100K users polling every 2s = 50K req/sec for mostly empty responses.
setInterval(async () => {
  const res = await fetch('/api/messages?since=' + lastTimestamp);
  const messages = await res.json();
  messages.forEach(renderMessage);
}, 2000);

Correct: Use persistent WebSocket connections

// ✅ GOOD — WebSocket: server pushes instantly, zero wasted requests.
const ws = new WebSocket('wss://chat.example.com?token=JWT_HERE');
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  renderMessage(message);
};
ws.onclose = () => setTimeout(connect, Math.min(1000 * 2 ** retries++, 30000));

Wrong: Sending media files through the message pipeline

# ❌ BAD — 10MB video through Kafka + WebSocket = memory pressure
message = {
    "conversation_id": 123,
    "content": base64.b64encode(video_bytes).decode(),  # 13MB base64!
    "content_type": "video"
}
kafka_producer.produce('chat.messages', json.dumps(message))

Correct: Upload media to object storage, send reference only

# ✅ GOOD — Upload to S3, send <1KB reference through message pipeline.
upload_url = s3.generate_presigned_url('put_object',
    Params={'Bucket': 'chat-media', 'Key': f'{uuid4()}.mp4'},
    ExpiresIn=3600)
# Client uploads directly to S3, then sends:
message = {
    "conversation_id": 123,
    "media_key": "abc123.mp4",
    "content_type": "video"
}

Wrong: Using wall-clock timestamps for message ordering

# ❌ BAD — Clock skew between servers causes out-of-order messages.
message = {
    "id": uuid4(),
    "created_at": datetime.utcnow(),  # different servers = different clocks
    "content": "Hello"
}

Correct: Use per-conversation monotonic sequence numbers

# ✅ GOOD — Atomic increment per conversation guarantees order.
sequence_num = redis_client.incr(f"seq:{conversation_id}")
message = {
    "id": generate_snowflake_id(),
    "sequence_num": sequence_num,
    "created_at": datetime.utcnow(),  # informational only
    "content": "Hello"
}

Wrong: Shared database for all microservices

-- ❌ BAD — Chat, presence, and notification services
-- all reading/writing the same PostgreSQL instance.
-- One slow query in notifications blocks message delivery.

Correct: Dedicated databases per service domain

# ✅ GOOD — Each service owns its data store. Failures are isolated.
services:
  chat-service:
    database: cassandra-cluster
  user-service:
    database: postgresql-primary
  presence-service:
    database: redis-cluster
  search-service:
    database: elasticsearch-cluster

Common Pitfalls

Thundering herd on reconnect: When a gateway server restarts, all its clients (100K+) reconnect simultaneously to surviving servers, causing cascading failures. Fix: implement exponential backoff with jitter — delay = min(base * 2^attempt + random(0, 1000), 30000). [src2]
Unbounded group fanout: A message to a 10K-member group triggers 10K write operations synchronously. Fix: cap group delivery at 256 via WebSocket, use push notification batching for the rest and tiered delivery. [src3]
Hot partition in Cassandra: A viral group chat puts millions of messages into one partition. Fix: add a time-bucket to the partition key — PRIMARY KEY ((conversation_id, bucket), message_id) where bucket is YYYY-MM. [src1]
Missing messages on reconnect: User goes offline for 2 hours, reconnects, but only gets messages since their last seen timestamp (clock skew loses messages). Fix: use last received sequence_num per conversation to request sync. [src4]
Presence storm: Broadcasting online/offline to all contacts on every state change overwhelms the system. Fix: batch presence updates (publish every 5s) and only send to contacts who have the app open. [src6]
No backpressure on WebSocket sends: Server pushes faster than client can process, filling OS send buffers. Fix: monitor ws.bufferedAmount; if it exceeds threshold, queue messages server-side and deliver on drain. [src2]
Single Redis instance for Pub/Sub: All cross-server routing through one Redis creates a SPOF and throughput ceiling (~200K msg/sec). Fix: use Redis Cluster with hash-tag routing. [src6]
Not separating read and write paths: Read-heavy ops compete with write-heavy ops on the same database. Fix: CQRS — writes to Cassandra, reads from a separate cache or read-optimized store. [src1]

Version History & Compatibility

Pattern/Technology	Status	Notable Changes	Migration Notes
WebSocket (RFC 6455)	Standard since 2011	HTTP/2 WebSocket (RFC 8441, 2018)	All modern browsers support; use `wss://` in production
Cassandra 4.x → ScyllaDB	Industry shift 2022-2024	Discord migrated 2022; 5x performance	ScyllaDB is Cassandra-compatible; drop-in replacement with CQL
Signal Protocol (libsignal)	Current E2EE standard	PQXDH (2023): post-quantum key agreement	WhatsApp, Messenger, Google Messages all use Signal Protocol
Kafka → NATS JetStream	Emerging alternative 2024+	Lower latency, simpler operations	NATS better for <1M msg/sec; Kafka for higher throughput
Socket.IO → native WebSocket	Simplification trend	Socket.IO adds overhead	For >100K connections, use raw `ws` library

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Building a real-time 1:1 and group messaging product	Need async communication (email-like, hours between replies)	Email system design or task queue architecture
Users expect <500ms message delivery latency	Message delivery can be delayed 5-30 seconds	HTTP-based notification system with polling
Need to support 10K+ concurrent connected users	<100 concurrent users on a single server	Simple WebSocket server without distributed infrastructure
Messages must be persisted and searchable	Ephemeral messages only (no persistence needed)	WebRTC data channels for P2P ephemeral messaging
Multiple client platforms (web, iOS, Android, desktop)	Single-platform internal tool	Platform-specific SDK (e.g., Firebase Realtime Database)
End-to-end encryption is a requirement	Server needs to read/process content (moderation, NLP)	Standard TLS + server-side encryption at rest

Important Caveats

End-to-end encryption eliminates server-side search, content moderation, and spam filtering — these must be implemented client-side or on metadata only
WebSocket connections consume server memory (2-10KB per connection) — at 1M connections, that is 2-10GB of RAM for connection state alone
Cassandra/ScyllaDB require careful data modeling upfront — changing partition keys later means migrating the entire dataset
Group size limits have cascading effects: a 256-member group at 100 msg/day = 25,600 fanout events/day; 100K-member channels require a fundamentally different fanout architecture
Multi-region deployment introduces message ordering challenges — use a single Kafka cluster per conversation partition or implement vector clocks
Mobile clients have battery and bandwidth constraints — implement message compression (protobuf instead of JSON), batch sync on reconnect, and respect OS background execution limits