Designing a Scalable Chat Application (WhatsApp/Slack Clone)
How do I design a scalable chat application (WhatsApp/Slack clone)?
TL;DR
- Bottom line: A scalable chat app requires WebSocket gateways for persistent connections, a message queue (Kafka/NATS) for async delivery, a write-optimized database (Cassandra/ScyllaDB) for message storage, and Redis for presence/pub-sub — separated into stateless services behind a load balancer.
- Key tool/command:
WebSocket + Kafka + Cassandra + Redis Pub/Sub - Watch out for: Storing messages in a relational database without partitioning — PostgreSQL works for <10M messages/day, but unsharded SQL becomes a bottleneck at scale.
- Works with: Any language/framework. Production stacks: Erlang/Elixir (WhatsApp/Discord), Java (Slack), Go (large-scale custom), Node.js (startups/MVPs).
Constraints
- WebSocket connections are stateful — never route the same user to different gateway servers mid-session without a reconnection protocol
- Message ordering must be guaranteed per-conversation — use a monotonic sequence ID per chat, not wall-clock timestamps
- End-to-end encryption (Signal Protocol) means the server cannot read message content — all search/indexing must happen client-side
- Database choice is load-dependent: PostgreSQL for <10M messages/day, Cassandra/ScyllaDB for >100M messages/day
- Never fan out writes to all group members synchronously — use async message queues (Kafka/NATS) to decouple send from deliver
- Group messages create O(N) fanout per message — cap group size or use tiered delivery (online members via WebSocket, offline via push queue)
Quick Reference
| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway / Load Balancer | Route client connections, TLS termination, rate limiting | Nginx, HAProxy, AWS ALB, Envoy | Horizontal — add more LB instances behind DNS round-robin |
| WebSocket Gateway | Maintain persistent bidirectional connections with clients | Custom (Elixir/Erlang, Go, Node.js), Socket.IO | Horizontal — shard by user_id; each server handles 100K-1M connections |
| Chat Service | Message validation, routing, conversation logic | Java, Go, Node.js, Elixir | Stateless — scale horizontally behind gateway |
| Message Queue | Decouple send from deliver, buffer during spikes, guarantee delivery | Apache Kafka, NATS JetStream, RabbitMQ, Amazon SQS | Partition by conversation_id; add partitions for throughput |
| Message Storage | Persist all messages durably | Cassandra, ScyllaDB, PostgreSQL (small scale), TiDB | Partition by (conversation_id, time_bucket); add nodes for capacity |
| User/Group Metadata DB | User profiles, contacts, group membership, settings | PostgreSQL, MySQL, CockroachDB | Read replicas + connection pooling; shard at extreme scale |
| Presence Service | Track online/offline/typing status per user | Redis (TTL keys + Pub/Sub), custom Erlang process | Redis Cluster with key-based sharding; TTL for auto-expiry |
| Media Storage | Store images, videos, voice messages, file attachments | AWS S3, Google Cloud Storage, Cloudflare R2 | Object storage is inherently scalable; use CDN for delivery |
| Push Notification Service | Deliver messages to offline/backgrounded users | FCM (Android), APNs (iOS), custom push gateway | Queue-based — consume from message queue for offline users |
| Search / Indexing | Full-text search across message history | Elasticsearch, OpenSearch, Meilisearch | Shard by user_id or workspace_id; index asynchronously |
| Cache Layer | Hot data: recent messages, session data, user profiles | Redis, Memcached | Redis Cluster; cache aside pattern with TTL |
| CDN | Deliver static assets and media globally | Cloudflare, CloudFront, Fastly | Edge-based; auto-scales |
| Encryption Service | Key exchange, E2E encryption (Signal Protocol) | libsignal, custom X3DH + Double Ratchet | Stateless — keys stored client-side; server only brokers key exchange |
Decision Tree
START — How many concurrent users?
├── <1K concurrent users?
│ ├── YES → SIMPLE STACK
│ │ ├── Single Node.js/Go server with WebSocket (Socket.IO or ws)
│ │ ├── PostgreSQL for messages + users
│ │ ├── In-memory presence (Map/Set on the server)
│ │ └── No message queue needed
│ └── NO ↓
├── 1K–100K concurrent users?
│ ├── YES → DISTRIBUTED STACK
│ │ ├── 2-10 WebSocket gateways behind load balancer
│ │ ├── Redis Pub/Sub for cross-server message routing
│ │ ├── PostgreSQL (partitioned) or MongoDB
│ │ ├── Kafka or RabbitMQ for async delivery
│ │ └── S3 + CDN for media
│ └── NO ↓
├── 100K–10M concurrent users?
│ ├── YES → LARGE-SCALE STACK
│ │ ├── 50-500 WebSocket gateways, sharded by user_id
│ │ ├── Kafka for all message routing
│ │ ├── Cassandra/ScyllaDB for message storage
│ │ ├── Redis Cluster for presence + cache
│ │ └── Elasticsearch for search
│ └── NO ↓
└── >10M concurrent users (WhatsApp/Discord scale)?
├── YES → HYPERSCALE STACK
│ ├── Custom WebSocket layer (Erlang/BEAM or Rust)
│ ├── Per-user message queues
│ ├── ScyllaDB or custom storage
│ ├── Multi-region deployment with geo-routing
│ ├── Signal Protocol E2E encryption
│ └── Dedicated teams per subsystem (>50 engineers)
└── DEFAULT → Start with DISTRIBUTED STACK, evolve as needed
Step-by-Step Guide
1. Define message data model and API contract
Design the core message schema before writing any code. Every message needs a globally unique ID, a conversation reference, sender, timestamp, and content. Use a snowflake-style ID or ULID for sortable unique IDs. [src4]
-- Core message schema (PostgreSQL for small scale)
CREATE TABLE messages (
message_id BIGINT PRIMARY KEY, -- Snowflake ID
conversation_id BIGINT NOT NULL,
sender_id BIGINT NOT NULL,
content TEXT,
content_type VARCHAR(20) DEFAULT 'text',
media_url TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
status VARCHAR(10) DEFAULT 'sent',
sequence_num BIGINT NOT NULL
);
CREATE INDEX idx_messages_conv_seq
ON messages (conversation_id, sequence_num DESC);
Verify: Schema supports both 1:1 and group conversations with the same table structure.
2. Build the WebSocket gateway layer
The gateway maintains persistent WebSocket connections. Use consistent hashing on user_id to route reconnections to the same server. Each server handles 100K-1M connections. [src2]
┌─────────────────────────────────────────┐
│ Load Balancer (L4/L7) │
│ (sticky sessions by user_id hash) │
└──────────┬──────────┬──────────┬────────┘
│ │ │
┌──────────▼┐ ┌──────▼──────┐ ┌▼─────────┐
│ Gateway-1 │ │ Gateway-2 │ │ Gateway-3 │
│ 500K conns │ │ 500K conns │ │ 500K conns│
└──────┬─────┘ └──────┬──────┘ └─────┬─────┘
│ │ │
┌──────▼───────────────▼───────────────▼──────┐
│ Redis Pub/Sub Cluster │
└──────────────────┬──────────────────────────┘
│
┌──────────────────▼──────────────────────────┐
│ Kafka Cluster │
└─────────────────────────────────────────────┘
Verify: Load test with artillery — each gateway should handle 100K+ concurrent WebSocket connections on a 16GB RAM instance.
3. Implement message routing and delivery
When User A sends a message: Gateway receives → Chat Service validates → Kafka queue → Consumer delivers to User B's gateway (or push notification if offline). [src3]
Message Flow (1:1):
User A Gateway-1 Chat Service Kafka Gateway-2 User B
│ │ │ │ │ │
│── send ──>│ │ │ │ │
│ │── validate ─>│ │ │ │
│ │<── ack ──────│ │ │ │
│<── ack ───│ │ │ │ │
│ │ │── produce >│ │ │
│ │ │ │── consume >│ │
│ │ │ │ │── deliver >│
│ │ │ │ │<─ delivered│
│<── delivered (via Redis Pub/Sub) ──────────────────│ │
Verify: Send a message between two users on different gateways — arrival in <500ms.
4. Set up message storage with write optimization
For high-volume messaging (>100M msg/day), use Cassandra/ScyllaDB partitioned by (conversation_id, time_bucket) for O(1) writes and efficient range reads. Discord migrated from Cassandra to ScyllaDB in 2022 for 5x performance improvement. [src1]
-- Cassandra/ScyllaDB schema
CREATE TABLE chat.messages (
conversation_id bigint,
bucket text, -- '2026-02'
message_id bigint,
sender_id bigint,
content blob,
content_type text,
media_url text,
created_at timestamp,
PRIMARY KEY ((conversation_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
Verify: Cassandra handles >100K writes/sec per node. Read recent 50 messages in <10ms.
5. Implement presence and typing indicators
Use Redis TTL keys for presence. Each connected user has a key expiring after 30s. Clients send heartbeats every 15s. Typing indicators are ephemeral — never persist them. [src6]
Presence Architecture:
Client heartbeat (every 15s)
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Gateway Server │────>│ Redis Cluster │
└─────────────────┘ │ presence:{uid} │
│ TTL = 30s │
└────────┬────────┘
│ Pub/Sub
┌─────────────┼──────────────┐
│ │ │
Gateway-1 Gateway-2 Gateway-3
Verify: Set presence key with 30s TTL, wait 31s, confirm key expired. Typing indicator latency <200ms.
6. Add push notifications for offline delivery
When the consumer detects the recipient is offline (no presence key in Redis), route to push notification service. Use FCM for Android and APNs for iOS. Batch notifications to avoid spamming. [src7]
Verify: Disconnect a user, send a message, verify push arrives within 3 seconds. On reconnect, all missed messages sync.
7. Design media handling pipeline
Never send media through WebSocket or message queue. Use pre-signed URL upload: client uploads directly to S3/R2, then sends a message with the media reference (<1KB). [src3]
Verify: Upload 10MB image, accessible via CDN in <2 seconds. Message with media reference is <1KB.
Code Examples
Node.js: WebSocket Chat Gateway Server
// Input: WebSocket connections from authenticated clients
// Output: Real-time message delivery between connected users
// Deps: npm install ws ioredis uuid ([email protected], [email protected])
const WebSocket = require('ws');
const Redis = require('ioredis');
const HEARTBEAT_INTERVAL = 15000;
const PRESENCE_TTL = 30;
const redisPub = new Redis(process.env.REDIS_URL);
const redisSub = new Redis(process.env.REDIS_URL);
const redisStore = new Redis(process.env.REDIS_URL);
const wss = new WebSocket.Server({ port: 8080 });
const clients = new Map(); // user_id -> WebSocket
wss.on('connection', (ws, req) => {
const userId = authenticateFromToken(req);
if (!userId) { ws.close(4001, 'Unauthorized'); return; }
clients.set(userId, ws);
refreshPresence(userId);
redisSub.subscribe(`deliver:${userId}`);
ws.on('message', async (data) => {
const msg = JSON.parse(data);
if (msg.type === 'heartbeat') { refreshPresence(userId); return; }
if (msg.type === 'send_message') {
const messageId = generateSnowflakeId();
const envelope = {
message_id: messageId,
conversation_id: msg.conversation_id,
sender_id: userId,
content: msg.content,
content_type: msg.content_type || 'text',
created_at: new Date().toISOString(),
};
ws.send(JSON.stringify({ type: 'ack', message_id: messageId }));
await redisPub.publish(
`conversation:${msg.conversation_id}`,
JSON.stringify(envelope)
);
}
});
ws.on('close', () => {
clients.delete(userId);
redisStore.del(`presence:${userId}`);
redisSub.unsubscribe(`deliver:${userId}`);
});
});
redisSub.on('message', (channel, message) => {
const [prefix, targetUserId] = channel.split(':');
if (prefix === 'deliver' && clients.has(targetUserId)) {
const ws = clients.get(targetUserId);
if (ws.readyState === WebSocket.OPEN) ws.send(message);
}
});
async function refreshPresence(userId) {
await redisStore.setex(`presence:${userId}`, PRESENCE_TTL, Date.now());
}
function generateSnowflakeId() {
return Date.now() * 1000 + Math.floor(Math.random() * 1000);
}
console.log('WebSocket gateway running on :8080');
Python: Kafka Message Consumer with Delivery Logic
# Input: Kafka messages from chat service
# Output: Routes to online users (Redis) or push notifications (offline)
# Deps: pip install confluent-kafka==2.3.0 redis==5.0.0 firebase-admin==6.4.0
import json
from confluent_kafka import Consumer, KafkaError
import redis
import firebase_admin
from firebase_admin import messaging
redis_client = redis.Redis.from_url("redis://localhost:6379", decode_responses=True)
firebase_admin.initialize_app()
consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'message-delivery-group',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False,
})
consumer.subscribe(['chat.messages'])
def is_user_online(user_id: str) -> bool:
return redis_client.exists(f"presence:{user_id}") > 0
def deliver_to_online_user(user_id: str, message: dict):
redis_client.publish(f"deliver:{user_id}", json.dumps(message))
def send_push_notification(user_id: str, message: dict):
token = redis_client.get(f"device_token:{user_id}")
if not token:
return
notification = messaging.Message(
token=token,
notification=messaging.Notification(
title=f"New message from {message.get('sender_name', 'Someone')}",
body=message.get('content', '')[:100],
),
data={'conversation_id': str(message['conversation_id'])},
)
messaging.send(notification)
def process_message(raw_message: str):
message = json.loads(raw_message)
sender_id = str(message['sender_id'])
members = redis_client.smembers(f"conv_members:{message['conversation_id']}")
for member_id in members:
if member_id == sender_id:
continue
if is_user_online(member_id):
deliver_to_online_user(member_id, message)
else:
send_push_notification(member_id, message)
try:
while True:
msg = consumer.poll(timeout=1.0)
if msg is None or msg.error():
continue
process_message(msg.value().decode('utf-8'))
consumer.commit(msg)
except KeyboardInterrupt:
pass
finally:
consumer.close()
Anti-Patterns
Wrong: Polling for new messages via HTTP
// ❌ BAD — HTTP polling wastes bandwidth and adds latency.
// At 100K users polling every 2s = 50K req/sec for mostly empty responses.
setInterval(async () => {
const res = await fetch('/api/messages?since=' + lastTimestamp);
const messages = await res.json();
messages.forEach(renderMessage);
}, 2000);
Correct: Use persistent WebSocket connections
// ✅ GOOD — WebSocket: server pushes instantly, zero wasted requests.
const ws = new WebSocket('wss://chat.example.com?token=JWT_HERE');
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
renderMessage(message);
};
ws.onclose = () => setTimeout(connect, Math.min(1000 * 2 ** retries++, 30000));
Wrong: Sending media files through the message pipeline
# ❌ BAD — 10MB video through Kafka + WebSocket = memory pressure
message = {
"conversation_id": 123,
"content": base64.b64encode(video_bytes).decode(), # 13MB base64!
"content_type": "video"
}
kafka_producer.produce('chat.messages', json.dumps(message))
Correct: Upload media to object storage, send reference only
# ✅ GOOD — Upload to S3, send <1KB reference through message pipeline.
upload_url = s3.generate_presigned_url('put_object',
Params={'Bucket': 'chat-media', 'Key': f'{uuid4()}.mp4'},
ExpiresIn=3600)
# Client uploads directly to S3, then sends:
message = {
"conversation_id": 123,
"media_key": "abc123.mp4",
"content_type": "video"
}
Wrong: Using wall-clock timestamps for message ordering
# ❌ BAD — Clock skew between servers causes out-of-order messages.
message = {
"id": uuid4(),
"created_at": datetime.utcnow(), # different servers = different clocks
"content": "Hello"
}
Correct: Use per-conversation monotonic sequence numbers
# ✅ GOOD — Atomic increment per conversation guarantees order.
sequence_num = redis_client.incr(f"seq:{conversation_id}")
message = {
"id": generate_snowflake_id(),
"sequence_num": sequence_num,
"created_at": datetime.utcnow(), # informational only
"content": "Hello"
}
Wrong: Shared database for all microservices
-- ❌ BAD — Chat, presence, and notification services
-- all reading/writing the same PostgreSQL instance.
-- One slow query in notifications blocks message delivery.
Correct: Dedicated databases per service domain
# ✅ GOOD — Each service owns its data store. Failures are isolated.
services:
chat-service:
database: cassandra-cluster
user-service:
database: postgresql-primary
presence-service:
database: redis-cluster
search-service:
database: elasticsearch-cluster
Common Pitfalls
- Thundering herd on reconnect: When a gateway server restarts, all its clients (100K+) reconnect simultaneously to surviving servers, causing cascading failures. Fix: implement exponential backoff with jitter —
delay = min(base * 2^attempt + random(0, 1000), 30000). [src2] - Unbounded group fanout: A message to a 10K-member group triggers 10K write operations synchronously. Fix: cap group delivery at 256 via WebSocket, use push notification batching for the rest and tiered delivery. [src3]
- Hot partition in Cassandra: A viral group chat puts millions of messages into one partition. Fix: add a time-bucket to the partition key —
PRIMARY KEY ((conversation_id, bucket), message_id)where bucket isYYYY-MM. [src1] - Missing messages on reconnect: User goes offline for 2 hours, reconnects, but only gets messages since their last seen timestamp (clock skew loses messages). Fix: use last received
sequence_numper conversation to request sync. [src4] - Presence storm: Broadcasting online/offline to all contacts on every state change overwhelms the system. Fix: batch presence updates (publish every 5s) and only send to contacts who have the app open. [src6]
- No backpressure on WebSocket sends: Server pushes faster than client can process, filling OS send buffers. Fix: monitor
ws.bufferedAmount; if it exceeds threshold, queue messages server-side and deliver on drain. [src2] - Single Redis instance for Pub/Sub: All cross-server routing through one Redis creates a SPOF and throughput ceiling (~200K msg/sec). Fix: use Redis Cluster with hash-tag routing. [src6]
- Not separating read and write paths: Read-heavy ops compete with write-heavy ops on the same database. Fix: CQRS — writes to Cassandra, reads from a separate cache or read-optimized store. [src1]
Version History & Compatibility
| Pattern/Technology | Status | Notable Changes | Migration Notes |
|---|---|---|---|
| WebSocket (RFC 6455) | Standard since 2011 | HTTP/2 WebSocket (RFC 8441, 2018) | All modern browsers support; use wss:// in production |
| Cassandra 4.x → ScyllaDB | Industry shift 2022-2024 | Discord migrated 2022; 5x performance | ScyllaDB is Cassandra-compatible; drop-in replacement with CQL |
| Signal Protocol (libsignal) | Current E2EE standard | PQXDH (2023): post-quantum key agreement | WhatsApp, Messenger, Google Messages all use Signal Protocol |
| Kafka → NATS JetStream | Emerging alternative 2024+ | Lower latency, simpler operations | NATS better for <1M msg/sec; Kafka for higher throughput |
| Socket.IO → native WebSocket | Simplification trend | Socket.IO adds overhead | For >100K connections, use raw ws library |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a real-time 1:1 and group messaging product | Need async communication (email-like, hours between replies) | Email system design or task queue architecture |
| Users expect <500ms message delivery latency | Message delivery can be delayed 5-30 seconds | HTTP-based notification system with polling |
| Need to support 10K+ concurrent connected users | <100 concurrent users on a single server | Simple WebSocket server without distributed infrastructure |
| Messages must be persisted and searchable | Ephemeral messages only (no persistence needed) | WebRTC data channels for P2P ephemeral messaging |
| Multiple client platforms (web, iOS, Android, desktop) | Single-platform internal tool | Platform-specific SDK (e.g., Firebase Realtime Database) |
| End-to-end encryption is a requirement | Server needs to read/process content (moderation, NLP) | Standard TLS + server-side encryption at rest |
Important Caveats
- End-to-end encryption eliminates server-side search, content moderation, and spam filtering — these must be implemented client-side or on metadata only
- WebSocket connections consume server memory (2-10KB per connection) — at 1M connections, that is 2-10GB of RAM for connection state alone
- Cassandra/ScyllaDB require careful data modeling upfront — changing partition keys later means migrating the entire dataset
- Group size limits have cascading effects: a 256-member group at 100 msg/day = 25,600 fanout events/day; 100K-member channels require a fundamentally different fanout architecture
- Multi-region deployment introduces message ordering challenges — use a single Kafka cluster per conversation partition or implement vector clocks
- Mobile clients have battery and bandwidth constraints — implement message compression (protobuf instead of JSON), batch sync on reconnect, and respect OS background execution limits