WebSocket + Kafka + Cassandra + Redis Pub/Sub| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway / Load Balancer | Route client connections, TLS termination, rate limiting | Nginx, HAProxy, AWS ALB, Envoy | Horizontal — add more LB instances behind DNS round-robin |
| WebSocket Gateway | Maintain persistent bidirectional connections with clients | Custom (Elixir/Erlang, Go, Node.js), Socket.IO | Horizontal — shard by user_id; each server handles 100K-1M connections |
| Chat Service | Message validation, routing, conversation logic | Java, Go, Node.js, Elixir | Stateless — scale horizontally behind gateway |
| Message Queue | Decouple send from deliver, buffer during spikes, guarantee delivery | Apache Kafka, NATS JetStream, RabbitMQ, Amazon SQS | Partition by conversation_id; add partitions for throughput |
| Message Storage | Persist all messages durably | Cassandra, ScyllaDB, PostgreSQL (small scale), TiDB | Partition by (conversation_id, time_bucket); add nodes for capacity |
| User/Group Metadata DB | User profiles, contacts, group membership, settings | PostgreSQL, MySQL, CockroachDB | Read replicas + connection pooling; shard at extreme scale |
| Presence Service | Track online/offline/typing status per user | Redis (TTL keys + Pub/Sub), custom Erlang process | Redis Cluster with key-based sharding; TTL for auto-expiry |
| Media Storage | Store images, videos, voice messages, file attachments | AWS S3, Google Cloud Storage, Cloudflare R2 | Object storage is inherently scalable; use CDN for delivery |
| Push Notification Service | Deliver messages to offline/backgrounded users | FCM (Android), APNs (iOS), custom push gateway | Queue-based — consume from message queue for offline users |
| Search / Indexing | Full-text search across message history | Elasticsearch, OpenSearch, Meilisearch | Shard by user_id or workspace_id; index asynchronously |
| Cache Layer | Hot data: recent messages, session data, user profiles | Redis, Memcached | Redis Cluster; cache aside pattern with TTL |
| CDN | Deliver static assets and media globally | Cloudflare, CloudFront, Fastly | Edge-based; auto-scales |
| Encryption Service | Key exchange, E2E encryption (Signal Protocol) | libsignal, custom X3DH + Double Ratchet | Stateless — keys stored client-side; server only brokers key exchange |
START — How many concurrent users?
├── <1K concurrent users?
│ ├── YES → SIMPLE STACK
│ │ ├── Single Node.js/Go server with WebSocket (Socket.IO or ws)
│ │ ├── PostgreSQL for messages + users
│ │ ├── In-memory presence (Map/Set on the server)
│ │ └── No message queue needed
│ └── NO ↓
├── 1K–100K concurrent users?
│ ├── YES → DISTRIBUTED STACK
│ │ ├── 2-10 WebSocket gateways behind load balancer
│ │ ├── Redis Pub/Sub for cross-server message routing
│ │ ├── PostgreSQL (partitioned) or MongoDB
│ │ ├── Kafka or RabbitMQ for async delivery
│ │ └── S3 + CDN for media
│ └── NO ↓
├── 100K–10M concurrent users?
│ ├── YES → LARGE-SCALE STACK
│ │ ├── 50-500 WebSocket gateways, sharded by user_id
│ │ ├── Kafka for all message routing
│ │ ├── Cassandra/ScyllaDB for message storage
│ │ ├── Redis Cluster for presence + cache
│ │ └── Elasticsearch for search
│ └── NO ↓
└── >10M concurrent users (WhatsApp/Discord scale)?
├── YES → HYPERSCALE STACK
│ ├── Custom WebSocket layer (Erlang/BEAM or Rust)
│ ├── Per-user message queues
│ ├── ScyllaDB or custom storage
│ ├── Multi-region deployment with geo-routing
│ ├── Signal Protocol E2E encryption
│ └── Dedicated teams per subsystem (>50 engineers)
└── DEFAULT → Start with DISTRIBUTED STACK, evolve as needed
Design the core message schema before writing any code. Every message needs a globally unique ID, a conversation reference, sender, timestamp, and content. Use a snowflake-style ID or ULID for sortable unique IDs. [src4]
-- Core message schema (PostgreSQL for small scale)
CREATE TABLE messages (
message_id BIGINT PRIMARY KEY, -- Snowflake ID
conversation_id BIGINT NOT NULL,
sender_id BIGINT NOT NULL,
content TEXT,
content_type VARCHAR(20) DEFAULT 'text',
media_url TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
status VARCHAR(10) DEFAULT 'sent',
sequence_num BIGINT NOT NULL
);
CREATE INDEX idx_messages_conv_seq
ON messages (conversation_id, sequence_num DESC);
Verify: Schema supports both 1:1 and group conversations with the same table structure.
The gateway maintains persistent WebSocket connections. Use consistent hashing on user_id to route reconnections to the same server. Each server handles 100K-1M connections. [src2]
┌─────────────────────────────────────────┐
│ Load Balancer (L4/L7) │
│ (sticky sessions by user_id hash) │
└──────────┬──────────┬──────────┬────────┘
│ │ │
┌──────────▼┐ ┌──────▼──────┐ ┌▼─────────┐
│ Gateway-1 │ │ Gateway-2 │ │ Gateway-3 │
│ 500K conns │ │ 500K conns │ │ 500K conns│
└──────┬─────┘ └──────┬──────┘ └─────┬─────┘
│ │ │
┌──────▼───────────────▼───────────────▼──────┐
│ Redis Pub/Sub Cluster │
└──────────────────┬──────────────────────────┘
│
┌──────────────────▼──────────────────────────┐
│ Kafka Cluster │
└─────────────────────────────────────────────┘
Verify: Load test with artillery — each gateway should handle 100K+ concurrent WebSocket connections on a 16GB RAM instance.
When User A sends a message: Gateway receives → Chat Service validates → Kafka queue → Consumer delivers to User B's gateway (or push notification if offline). [src3]
Message Flow (1:1):
User A Gateway-1 Chat Service Kafka Gateway-2 User B
│ │ │ │ │ │
│── send ──>│ │ │ │ │
│ │── validate ─>│ │ │ │
│ │<── ack ──────│ │ │ │
│<── ack ───│ │ │ │ │
│ │ │── produce >│ │ │
│ │ │ │── consume >│ │
│ │ │ │ │── deliver >│
│ │ │ │ │<─ delivered│
│<── delivered (via Redis Pub/Sub) ──────────────────│ │
Verify: Send a message between two users on different gateways — arrival in <500ms.
For high-volume messaging (>100M msg/day), use Cassandra/ScyllaDB partitioned by (conversation_id, time_bucket) for O(1) writes and efficient range reads. Discord migrated from Cassandra to ScyllaDB in 2022 for 5x performance improvement. [src1]
-- Cassandra/ScyllaDB schema
CREATE TABLE chat.messages (
conversation_id bigint,
bucket text, -- '2026-02'
message_id bigint,
sender_id bigint,
content blob,
content_type text,
media_url text,
created_at timestamp,
PRIMARY KEY ((conversation_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
Verify: Cassandra handles >100K writes/sec per node. Read recent 50 messages in <10ms.
Use Redis TTL keys for presence. Each connected user has a key expiring after 30s. Clients send heartbeats every 15s. Typing indicators are ephemeral — never persist them. [src6]
Presence Architecture:
Client heartbeat (every 15s)
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Gateway Server │────>│ Redis Cluster │
└─────────────────┘ │ presence:{uid} │
│ TTL = 30s │
└────────┬────────┘
│ Pub/Sub
┌─────────────┼──────────────┐
│ │ │
Gateway-1 Gateway-2 Gateway-3
Verify: Set presence key with 30s TTL, wait 31s, confirm key expired. Typing indicator latency <200ms.
When the consumer detects the recipient is offline (no presence key in Redis), route to push notification service. Use FCM for Android and APNs for iOS. Batch notifications to avoid spamming. [src7]
Verify: Disconnect a user, send a message, verify push arrives within 3 seconds. On reconnect, all missed messages sync.
Never send media through WebSocket or message queue. Use pre-signed URL upload: client uploads directly to S3/R2, then sends a message with the media reference (<1KB). [src3]
Verify: Upload 10MB image, accessible via CDN in <2 seconds. Message with media reference is <1KB.
// Input: WebSocket connections from authenticated clients
// Output: Real-time message delivery between connected users
// Deps: npm install ws ioredis uuid ([email protected], [email protected])
const WebSocket = require('ws');
const Redis = require('ioredis');
const HEARTBEAT_INTERVAL = 15000;
const PRESENCE_TTL = 30;
const redisPub = new Redis(process.env.REDIS_URL);
const redisSub = new Redis(process.env.REDIS_URL);
const redisStore = new Redis(process.env.REDIS_URL);
const wss = new WebSocket.Server({ port: 8080 });
const clients = new Map(); // user_id -> WebSocket
wss.on('connection', (ws, req) => {
const userId = authenticateFromToken(req);
if (!userId) { ws.close(4001, 'Unauthorized'); return; }
clients.set(userId, ws);
refreshPresence(userId);
redisSub.subscribe(`deliver:${userId}`);
ws.on('message', async (data) => {
const msg = JSON.parse(data);
if (msg.type === 'heartbeat') { refreshPresence(userId); return; }
if (msg.type === 'send_message') {
const messageId = generateSnowflakeId();
const envelope = {
message_id: messageId,
conversation_id: msg.conversation_id,
sender_id: userId,
content: msg.content,
content_type: msg.content_type || 'text',
created_at: new Date().toISOString(),
};
ws.send(JSON.stringify({ type: 'ack', message_id: messageId }));
await redisPub.publish(
`conversation:${msg.conversation_id}`,
JSON.stringify(envelope)
);
}
});
ws.on('close', () => {
clients.delete(userId);
redisStore.del(`presence:${userId}`);
redisSub.unsubscribe(`deliver:${userId}`);
});
});
redisSub.on('message', (channel, message) => {
const [prefix, targetUserId] = channel.split(':');
if (prefix === 'deliver' && clients.has(targetUserId)) {
const ws = clients.get(targetUserId);
if (ws.readyState === WebSocket.OPEN) ws.send(message);
}
});
async function refreshPresence(userId) {
await redisStore.setex(`presence:${userId}`, PRESENCE_TTL, Date.now());
}
function generateSnowflakeId() {
return Date.now() * 1000 + Math.floor(Math.random() * 1000);
}
console.log('WebSocket gateway running on :8080');
# Input: Kafka messages from chat service
# Output: Routes to online users (Redis) or push notifications (offline)
# Deps: pip install confluent-kafka==2.3.0 redis==5.0.0 firebase-admin==6.4.0
import json
from confluent_kafka import Consumer, KafkaError
import redis
import firebase_admin
from firebase_admin import messaging
redis_client = redis.Redis.from_url("redis://localhost:6379", decode_responses=True)
firebase_admin.initialize_app()
consumer = Consumer({
'bootstrap.servers': 'kafka:9092',
'group.id': 'message-delivery-group',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False,
})
consumer.subscribe(['chat.messages'])
def is_user_online(user_id: str) -> bool:
return redis_client.exists(f"presence:{user_id}") > 0
def deliver_to_online_user(user_id: str, message: dict):
redis_client.publish(f"deliver:{user_id}", json.dumps(message))
def send_push_notification(user_id: str, message: dict):
token = redis_client.get(f"device_token:{user_id}")
if not token:
return
notification = messaging.Message(
token=token,
notification=messaging.Notification(
title=f"New message from {message.get('sender_name', 'Someone')}",
body=message.get('content', '')[:100],
),
data={'conversation_id': str(message['conversation_id'])},
)
messaging.send(notification)
def process_message(raw_message: str):
message = json.loads(raw_message)
sender_id = str(message['sender_id'])
members = redis_client.smembers(f"conv_members:{message['conversation_id']}")
for member_id in members:
if member_id == sender_id:
continue
if is_user_online(member_id):
deliver_to_online_user(member_id, message)
else:
send_push_notification(member_id, message)
try:
while True:
msg = consumer.poll(timeout=1.0)
if msg is None or msg.error():
continue
process_message(msg.value().decode('utf-8'))
consumer.commit(msg)
except KeyboardInterrupt:
pass
finally:
consumer.close()
// ❌ BAD — HTTP polling wastes bandwidth and adds latency.
// At 100K users polling every 2s = 50K req/sec for mostly empty responses.
setInterval(async () => {
const res = await fetch('/api/messages?since=' + lastTimestamp);
const messages = await res.json();
messages.forEach(renderMessage);
}, 2000);
// ✅ GOOD — WebSocket: server pushes instantly, zero wasted requests.
const ws = new WebSocket('wss://chat.example.com?token=JWT_HERE');
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
renderMessage(message);
};
ws.onclose = () => setTimeout(connect, Math.min(1000 * 2 ** retries++, 30000));
# ❌ BAD — 10MB video through Kafka + WebSocket = memory pressure
message = {
"conversation_id": 123,
"content": base64.b64encode(video_bytes).decode(), # 13MB base64!
"content_type": "video"
}
kafka_producer.produce('chat.messages', json.dumps(message))
# ✅ GOOD — Upload to S3, send <1KB reference through message pipeline.
upload_url = s3.generate_presigned_url('put_object',
Params={'Bucket': 'chat-media', 'Key': f'{uuid4()}.mp4'},
ExpiresIn=3600)
# Client uploads directly to S3, then sends:
message = {
"conversation_id": 123,
"media_key": "abc123.mp4",
"content_type": "video"
}
# ❌ BAD — Clock skew between servers causes out-of-order messages.
message = {
"id": uuid4(),
"created_at": datetime.utcnow(), # different servers = different clocks
"content": "Hello"
}
# ✅ GOOD — Atomic increment per conversation guarantees order.
sequence_num = redis_client.incr(f"seq:{conversation_id}")
message = {
"id": generate_snowflake_id(),
"sequence_num": sequence_num,
"created_at": datetime.utcnow(), # informational only
"content": "Hello"
}
-- ❌ BAD — Chat, presence, and notification services
-- all reading/writing the same PostgreSQL instance.
-- One slow query in notifications blocks message delivery.
# ✅ GOOD — Each service owns its data store. Failures are isolated.
services:
chat-service:
database: cassandra-cluster
user-service:
database: postgresql-primary
presence-service:
database: redis-cluster
search-service:
database: elasticsearch-cluster
delay = min(base * 2^attempt + random(0, 1000), 30000). [src2]PRIMARY KEY ((conversation_id, bucket), message_id) where bucket is YYYY-MM. [src1]sequence_num per conversation to request sync. [src4]ws.bufferedAmount; if it exceeds threshold, queue messages server-side and deliver on drain. [src2]| Pattern/Technology | Status | Notable Changes | Migration Notes |
|---|---|---|---|
| WebSocket (RFC 6455) | Standard since 2011 | HTTP/2 WebSocket (RFC 8441, 2018) | All modern browsers support; use wss:// in production |
| Cassandra 4.x → ScyllaDB | Industry shift 2022-2024 | Discord migrated 2022; 5x performance | ScyllaDB is Cassandra-compatible; drop-in replacement with CQL |
| Signal Protocol (libsignal) | Current E2EE standard | PQXDH (2023): post-quantum key agreement | WhatsApp, Messenger, Google Messages all use Signal Protocol |
| Kafka → NATS JetStream | Emerging alternative 2024+ | Lower latency, simpler operations | NATS better for <1M msg/sec; Kafka for higher throughput |
| Socket.IO → native WebSocket | Simplification trend | Socket.IO adds overhead | For >100K connections, use raw ws library |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a real-time 1:1 and group messaging product | Need async communication (email-like, hours between replies) | Email system design or task queue architecture |
| Users expect <500ms message delivery latency | Message delivery can be delayed 5-30 seconds | HTTP-based notification system with polling |
| Need to support 10K+ concurrent connected users | <100 concurrent users on a single server | Simple WebSocket server without distributed infrastructure |
| Messages must be persisted and searchable | Ephemeral messages only (no persistence needed) | WebRTC data channels for P2P ephemeral messaging |
| Multiple client platforms (web, iOS, Android, desktop) | Single-platform internal tool | Platform-specific SDK (e.g., Firebase Realtime Database) |
| End-to-end encryption is a requirement | Server needs to read/process content (moderation, NLP) | Standard TLS + server-side encryption at rest |