Designing a Scalable Chat Application (WhatsApp/Slack Clone)

Type: Software Reference Confidence: 0.91 Sources: 7 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

ComponentRoleTechnology OptionsScaling Strategy
API Gateway / Load BalancerRoute client connections, TLS termination, rate limitingNginx, HAProxy, AWS ALB, EnvoyHorizontal — add more LB instances behind DNS round-robin
WebSocket GatewayMaintain persistent bidirectional connections with clientsCustom (Elixir/Erlang, Go, Node.js), Socket.IOHorizontal — shard by user_id; each server handles 100K-1M connections
Chat ServiceMessage validation, routing, conversation logicJava, Go, Node.js, ElixirStateless — scale horizontally behind gateway
Message QueueDecouple send from deliver, buffer during spikes, guarantee deliveryApache Kafka, NATS JetStream, RabbitMQ, Amazon SQSPartition by conversation_id; add partitions for throughput
Message StoragePersist all messages durablyCassandra, ScyllaDB, PostgreSQL (small scale), TiDBPartition by (conversation_id, time_bucket); add nodes for capacity
User/Group Metadata DBUser profiles, contacts, group membership, settingsPostgreSQL, MySQL, CockroachDBRead replicas + connection pooling; shard at extreme scale
Presence ServiceTrack online/offline/typing status per userRedis (TTL keys + Pub/Sub), custom Erlang processRedis Cluster with key-based sharding; TTL for auto-expiry
Media StorageStore images, videos, voice messages, file attachmentsAWS S3, Google Cloud Storage, Cloudflare R2Object storage is inherently scalable; use CDN for delivery
Push Notification ServiceDeliver messages to offline/backgrounded usersFCM (Android), APNs (iOS), custom push gatewayQueue-based — consume from message queue for offline users
Search / IndexingFull-text search across message historyElasticsearch, OpenSearch, MeilisearchShard by user_id or workspace_id; index asynchronously
Cache LayerHot data: recent messages, session data, user profilesRedis, MemcachedRedis Cluster; cache aside pattern with TTL
CDNDeliver static assets and media globallyCloudflare, CloudFront, FastlyEdge-based; auto-scales
Encryption ServiceKey exchange, E2E encryption (Signal Protocol)libsignal, custom X3DH + Double RatchetStateless — keys stored client-side; server only brokers key exchange

Decision Tree

START — How many concurrent users?
├── <1K concurrent users?
│   ├── YES → SIMPLE STACK
│   │   ├── Single Node.js/Go server with WebSocket (Socket.IO or ws)
│   │   ├── PostgreSQL for messages + users
│   │   ├── In-memory presence (Map/Set on the server)
│   │   └── No message queue needed
│   └── NO ↓
├── 1K–100K concurrent users?
│   ├── YES → DISTRIBUTED STACK
│   │   ├── 2-10 WebSocket gateways behind load balancer
│   │   ├── Redis Pub/Sub for cross-server message routing
│   │   ├── PostgreSQL (partitioned) or MongoDB
│   │   ├── Kafka or RabbitMQ for async delivery
│   │   └── S3 + CDN for media
│   └── NO ↓
├── 100K–10M concurrent users?
│   ├── YES → LARGE-SCALE STACK
│   │   ├── 50-500 WebSocket gateways, sharded by user_id
│   │   ├── Kafka for all message routing
│   │   ├── Cassandra/ScyllaDB for message storage
│   │   ├── Redis Cluster for presence + cache
│   │   └── Elasticsearch for search
│   └── NO ↓
└── >10M concurrent users (WhatsApp/Discord scale)?
    ├── YES → HYPERSCALE STACK
    │   ├── Custom WebSocket layer (Erlang/BEAM or Rust)
    │   ├── Per-user message queues
    │   ├── ScyllaDB or custom storage
    │   ├── Multi-region deployment with geo-routing
    │   ├── Signal Protocol E2E encryption
    │   └── Dedicated teams per subsystem (>50 engineers)
    └── DEFAULT → Start with DISTRIBUTED STACK, evolve as needed

Step-by-Step Guide

1. Define message data model and API contract

Design the core message schema before writing any code. Every message needs a globally unique ID, a conversation reference, sender, timestamp, and content. Use a snowflake-style ID or ULID for sortable unique IDs. [src4]

-- Core message schema (PostgreSQL for small scale)
CREATE TABLE messages (
    message_id    BIGINT PRIMARY KEY,  -- Snowflake ID
    conversation_id BIGINT NOT NULL,
    sender_id     BIGINT NOT NULL,
    content       TEXT,
    content_type  VARCHAR(20) DEFAULT 'text',
    media_url     TEXT,
    created_at    TIMESTAMP NOT NULL DEFAULT NOW(),
    status        VARCHAR(10) DEFAULT 'sent',
    sequence_num  BIGINT NOT NULL
);

CREATE INDEX idx_messages_conv_seq
  ON messages (conversation_id, sequence_num DESC);

Verify: Schema supports both 1:1 and group conversations with the same table structure.

2. Build the WebSocket gateway layer

The gateway maintains persistent WebSocket connections. Use consistent hashing on user_id to route reconnections to the same server. Each server handles 100K-1M connections. [src2]

                ┌─────────────────────────────────────────┐
                │           Load Balancer (L4/L7)         │
                │    (sticky sessions by user_id hash)    │
                └──────────┬──────────┬──────────┬────────┘
                           │          │          │
                ┌──────────▼┐  ┌──────▼──────┐  ┌▼─────────┐
                │ Gateway-1  │  │  Gateway-2  │  │ Gateway-3 │
                │ 500K conns │  │  500K conns │  │ 500K conns│
                └──────┬─────┘  └──────┬──────┘  └─────┬─────┘
                       │               │               │
                ┌──────▼───────────────▼───────────────▼──────┐
                │          Redis Pub/Sub Cluster               │
                └──────────────────┬──────────────────────────┘
                                   │
                ┌──────────────────▼──────────────────────────┐
                │              Kafka Cluster                   │
                └─────────────────────────────────────────────┘

Verify: Load test with artillery — each gateway should handle 100K+ concurrent WebSocket connections on a 16GB RAM instance.

3. Implement message routing and delivery

When User A sends a message: Gateway receives → Chat Service validates → Kafka queue → Consumer delivers to User B's gateway (or push notification if offline). [src3]

Message Flow (1:1):

User A     Gateway-1     Chat Service     Kafka     Gateway-2     User B
  │            │               │            │            │           │
  │── send ──>│               │            │            │           │
  │            │── validate ─>│            │            │           │
  │            │<── ack ──────│            │            │           │
  │<── ack ───│               │            │            │           │
  │            │               │── produce >│            │           │
  │            │               │            │── consume >│           │
  │            │               │            │            │── deliver >│
  │            │               │            │            │<─ delivered│
  │<── delivered (via Redis Pub/Sub) ──────────────────│           │

Verify: Send a message between two users on different gateways — arrival in <500ms.

4. Set up message storage with write optimization

For high-volume messaging (>100M msg/day), use Cassandra/ScyllaDB partitioned by (conversation_id, time_bucket) for O(1) writes and efficient range reads. Discord migrated from Cassandra to ScyllaDB in 2022 for 5x performance improvement. [src1]

-- Cassandra/ScyllaDB schema
CREATE TABLE chat.messages (
    conversation_id bigint,
    bucket          text,       -- '2026-02'
    message_id      bigint,
    sender_id       bigint,
    content         blob,
    content_type    text,
    media_url       text,
    created_at      timestamp,
    PRIMARY KEY ((conversation_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Verify: Cassandra handles >100K writes/sec per node. Read recent 50 messages in <10ms.

5. Implement presence and typing indicators

Use Redis TTL keys for presence. Each connected user has a key expiring after 30s. Clients send heartbeats every 15s. Typing indicators are ephemeral — never persist them. [src6]

Presence Architecture:

Client heartbeat (every 15s)
        │
        ▼
┌─────────────────┐     ┌─────────────────┐
│  Gateway Server  │────>│  Redis Cluster   │
└─────────────────┘     │  presence:{uid}  │
                         │  TTL = 30s       │
                         └────────┬────────┘
                                  │ Pub/Sub
                    ┌─────────────┼──────────────┐
                    │             │              │
               Gateway-1    Gateway-2      Gateway-3

Verify: Set presence key with 30s TTL, wait 31s, confirm key expired. Typing indicator latency <200ms.

6. Add push notifications for offline delivery

When the consumer detects the recipient is offline (no presence key in Redis), route to push notification service. Use FCM for Android and APNs for iOS. Batch notifications to avoid spamming. [src7]

Verify: Disconnect a user, send a message, verify push arrives within 3 seconds. On reconnect, all missed messages sync.

7. Design media handling pipeline

Never send media through WebSocket or message queue. Use pre-signed URL upload: client uploads directly to S3/R2, then sends a message with the media reference (<1KB). [src3]

Verify: Upload 10MB image, accessible via CDN in <2 seconds. Message with media reference is <1KB.

Code Examples

Node.js: WebSocket Chat Gateway Server

// Input:  WebSocket connections from authenticated clients
// Output: Real-time message delivery between connected users
// Deps:   npm install ws ioredis uuid ([email protected], [email protected])

const WebSocket = require('ws');
const Redis = require('ioredis');

const HEARTBEAT_INTERVAL = 15000;
const PRESENCE_TTL = 30;

const redisPub = new Redis(process.env.REDIS_URL);
const redisSub = new Redis(process.env.REDIS_URL);
const redisStore = new Redis(process.env.REDIS_URL);

const wss = new WebSocket.Server({ port: 8080 });
const clients = new Map(); // user_id -> WebSocket

wss.on('connection', (ws, req) => {
  const userId = authenticateFromToken(req);
  if (!userId) { ws.close(4001, 'Unauthorized'); return; }

  clients.set(userId, ws);
  refreshPresence(userId);
  redisSub.subscribe(`deliver:${userId}`);

  ws.on('message', async (data) => {
    const msg = JSON.parse(data);
    if (msg.type === 'heartbeat') { refreshPresence(userId); return; }
    if (msg.type === 'send_message') {
      const messageId = generateSnowflakeId();
      const envelope = {
        message_id: messageId,
        conversation_id: msg.conversation_id,
        sender_id: userId,
        content: msg.content,
        content_type: msg.content_type || 'text',
        created_at: new Date().toISOString(),
      };
      ws.send(JSON.stringify({ type: 'ack', message_id: messageId }));
      await redisPub.publish(
        `conversation:${msg.conversation_id}`,
        JSON.stringify(envelope)
      );
    }
  });

  ws.on('close', () => {
    clients.delete(userId);
    redisStore.del(`presence:${userId}`);
    redisSub.unsubscribe(`deliver:${userId}`);
  });
});

redisSub.on('message', (channel, message) => {
  const [prefix, targetUserId] = channel.split(':');
  if (prefix === 'deliver' && clients.has(targetUserId)) {
    const ws = clients.get(targetUserId);
    if (ws.readyState === WebSocket.OPEN) ws.send(message);
  }
});

async function refreshPresence(userId) {
  await redisStore.setex(`presence:${userId}`, PRESENCE_TTL, Date.now());
}

function generateSnowflakeId() {
  return Date.now() * 1000 + Math.floor(Math.random() * 1000);
}

console.log('WebSocket gateway running on :8080');

Python: Kafka Message Consumer with Delivery Logic

# Input:  Kafka messages from chat service
# Output: Routes to online users (Redis) or push notifications (offline)
# Deps:   pip install confluent-kafka==2.3.0 redis==5.0.0 firebase-admin==6.4.0

import json
from confluent_kafka import Consumer, KafkaError
import redis
import firebase_admin
from firebase_admin import messaging

redis_client = redis.Redis.from_url("redis://localhost:6379", decode_responses=True)
firebase_admin.initialize_app()

consumer = Consumer({
    'bootstrap.servers': 'kafka:9092',
    'group.id': 'message-delivery-group',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,
})
consumer.subscribe(['chat.messages'])

def is_user_online(user_id: str) -> bool:
    return redis_client.exists(f"presence:{user_id}") > 0

def deliver_to_online_user(user_id: str, message: dict):
    redis_client.publish(f"deliver:{user_id}", json.dumps(message))

def send_push_notification(user_id: str, message: dict):
    token = redis_client.get(f"device_token:{user_id}")
    if not token:
        return
    notification = messaging.Message(
        token=token,
        notification=messaging.Notification(
            title=f"New message from {message.get('sender_name', 'Someone')}",
            body=message.get('content', '')[:100],
        ),
        data={'conversation_id': str(message['conversation_id'])},
    )
    messaging.send(notification)

def process_message(raw_message: str):
    message = json.loads(raw_message)
    sender_id = str(message['sender_id'])
    members = redis_client.smembers(f"conv_members:{message['conversation_id']}")
    for member_id in members:
        if member_id == sender_id:
            continue
        if is_user_online(member_id):
            deliver_to_online_user(member_id, message)
        else:
            send_push_notification(member_id, message)

try:
    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None or msg.error():
            continue
        process_message(msg.value().decode('utf-8'))
        consumer.commit(msg)
except KeyboardInterrupt:
    pass
finally:
    consumer.close()

Anti-Patterns

Wrong: Polling for new messages via HTTP

// ❌ BAD — HTTP polling wastes bandwidth and adds latency.
// At 100K users polling every 2s = 50K req/sec for mostly empty responses.
setInterval(async () => {
  const res = await fetch('/api/messages?since=' + lastTimestamp);
  const messages = await res.json();
  messages.forEach(renderMessage);
}, 2000);

Correct: Use persistent WebSocket connections

// ✅ GOOD — WebSocket: server pushes instantly, zero wasted requests.
const ws = new WebSocket('wss://chat.example.com?token=JWT_HERE');
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  renderMessage(message);
};
ws.onclose = () => setTimeout(connect, Math.min(1000 * 2 ** retries++, 30000));

Wrong: Sending media files through the message pipeline

# ❌ BAD — 10MB video through Kafka + WebSocket = memory pressure
message = {
    "conversation_id": 123,
    "content": base64.b64encode(video_bytes).decode(),  # 13MB base64!
    "content_type": "video"
}
kafka_producer.produce('chat.messages', json.dumps(message))

Correct: Upload media to object storage, send reference only

# ✅ GOOD — Upload to S3, send <1KB reference through message pipeline.
upload_url = s3.generate_presigned_url('put_object',
    Params={'Bucket': 'chat-media', 'Key': f'{uuid4()}.mp4'},
    ExpiresIn=3600)
# Client uploads directly to S3, then sends:
message = {
    "conversation_id": 123,
    "media_key": "abc123.mp4",
    "content_type": "video"
}

Wrong: Using wall-clock timestamps for message ordering

# ❌ BAD — Clock skew between servers causes out-of-order messages.
message = {
    "id": uuid4(),
    "created_at": datetime.utcnow(),  # different servers = different clocks
    "content": "Hello"
}

Correct: Use per-conversation monotonic sequence numbers

# ✅ GOOD — Atomic increment per conversation guarantees order.
sequence_num = redis_client.incr(f"seq:{conversation_id}")
message = {
    "id": generate_snowflake_id(),
    "sequence_num": sequence_num,
    "created_at": datetime.utcnow(),  # informational only
    "content": "Hello"
}

Wrong: Shared database for all microservices

-- ❌ BAD — Chat, presence, and notification services
-- all reading/writing the same PostgreSQL instance.
-- One slow query in notifications blocks message delivery.

Correct: Dedicated databases per service domain

# ✅ GOOD — Each service owns its data store. Failures are isolated.
services:
  chat-service:
    database: cassandra-cluster
  user-service:
    database: postgresql-primary
  presence-service:
    database: redis-cluster
  search-service:
    database: elasticsearch-cluster

Common Pitfalls

Version History & Compatibility

Pattern/TechnologyStatusNotable ChangesMigration Notes
WebSocket (RFC 6455)Standard since 2011HTTP/2 WebSocket (RFC 8441, 2018)All modern browsers support; use wss:// in production
Cassandra 4.x → ScyllaDBIndustry shift 2022-2024Discord migrated 2022; 5x performanceScyllaDB is Cassandra-compatible; drop-in replacement with CQL
Signal Protocol (libsignal)Current E2EE standardPQXDH (2023): post-quantum key agreementWhatsApp, Messenger, Google Messages all use Signal Protocol
Kafka → NATS JetStreamEmerging alternative 2024+Lower latency, simpler operationsNATS better for <1M msg/sec; Kafka for higher throughput
Socket.IO → native WebSocketSimplification trendSocket.IO adds overheadFor >100K connections, use raw ws library

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Building a real-time 1:1 and group messaging productNeed async communication (email-like, hours between replies)Email system design or task queue architecture
Users expect <500ms message delivery latencyMessage delivery can be delayed 5-30 secondsHTTP-based notification system with polling
Need to support 10K+ concurrent connected users<100 concurrent users on a single serverSimple WebSocket server without distributed infrastructure
Messages must be persisted and searchableEphemeral messages only (no persistence needed)WebRTC data channels for P2P ephemeral messaging
Multiple client platforms (web, iOS, Android, desktop)Single-platform internal toolPlatform-specific SDK (e.g., Firebase Realtime Database)
End-to-end encryption is a requirementServer needs to read/process content (moderation, NLP)Standard TLS + server-side encryption at rest

Important Caveats

Related Units