Real-Time Collaboration System Design (Google Docs Clone)
How do I design a real-time collaboration system (Google Docs clone)?
TL;DR
- Bottom line: Use a CRDT library (Yjs or Automerge) for new projects; OT only if you need a proven centralized model with simpler server logic.
- Key tool/command:
new Y.Doc()withy-websocketprovider -- production-ready collaborative editing in under 50 lines. - Watch out for: Rich-text formatting merge conflicts -- plain-text CRDTs do not handle overlapping bold/italic ranges correctly (use Peritext-style approach).
- Works with: Any WebSocket-capable backend (Node.js, Go, Rust, Elixir); Yjs supports 20+ editor bindings (ProseMirror, TipTap, Monaco, CodeMirror, Quill).
Constraints
- Server must enforce total ordering OR use a mathematically proven CRDT -- partial implementations cause data loss under concurrent edits
- Rich-text CRDTs (Peritext) are significantly harder than plain-text -- never assume plain-text CRDT solutions transfer directly to rich text
- WebSocket reconnection with exponential backoff is mandatory -- network partitions WILL happen in production
- Undo/redo requires intention-preserving algorithms, not simple stack-based undo -- concurrent edits invalidate naive undo stacks
- Document state grows unbounded without garbage collection -- implement tombstone compaction or periodic snapshots
Quick Reference
| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| Conflict Resolution Engine | Merges concurrent edits without data loss | Yjs (CRDT), Automerge (CRDT), ShareDB (OT), Google OT | Stateless -- runs on each client + server |
| WebSocket Gateway | Persistent bidirectional connection for real-time sync | y-websocket, Socket.IO, ws (Node), Gorilla (Go) | Horizontal scale with sticky sessions per document |
| Document Router | Routes clients to correct server for their document | Consistent hashing on document ID | Hash ring with virtual nodes |
| Presence Service | Tracks who is online, cursor positions, selections | Liveblocks Presence, Phoenix Presence, custom broadcast | Ephemeral state -- no persistence needed |
| Awareness Protocol | Broadcasts cursor/selection state to peers | Yjs Awareness, custom WebSocket messages | Piggyback on sync connection; throttle to 10-15 fps |
| Operation Log / WAL | Ordered history of all edits for replay and recovery | Append-only log (Kafka, Redis Streams, PostgreSQL) | Partition by document ID |
| Snapshot Store | Periodic full-document snapshots for fast load | S3, PostgreSQL JSONB, Redis | Snapshot every N ops or T seconds |
| Auth + Access Control | Per-document permissions (view/edit/comment) | JWT tokens validated at WebSocket handshake | Stateless token validation at gateway |
| Offline Queue | Buffers edits when disconnected, replays on reconnect | IndexedDB (browser), SQLite (mobile) | Local-first -- syncs on reconnection |
| Rich-Text Formatting Layer | Handles bold, italic, links without merge conflicts | Peritext (CRDT), ProseMirror + Yjs, TipTap + Yjs | Integrated with conflict resolution engine |
| Garbage Collector | Compacts tombstones and reclaims memory | Yjs GC, Automerge compaction, custom sweep | Run during low-activity periods |
| CDN / Edge Cache | Serves static assets; caches read-only snapshots | Cloudflare, CloudFront, Fastly | Cache invalidation on document update |
Decision Tree
START
|-- Need offline editing + peer-to-peer sync?
| |-- YES --> Use CRDT (Yjs or Automerge)
| +-- NO
| |-- Need rich-text collaborative editing?
| | |-- YES
| | | |-- Want managed solution?
| | | | |-- YES --> Liveblocks or TipTap Cloud
| | | | +-- NO --> Yjs + TipTap/ProseMirror + y-websocket
| | +-- NO (plain text or structured data)
| | |-- Central server acceptable?
| | | |-- YES
| | | | |-- < 1K concurrent users per doc?
| | | | | |-- YES --> OT (ShareDB) or Yjs -- both work
| | | | | +-- NO --> Yjs (better memory profile at scale)
| | | +-- NO --> Yjs with y-webrtc provider
| +-- Canvas/graphics editing?
| |-- YES --> Property-level LWW (Figma model) [src1]
| +-- NO --> Yjs with appropriate shared type (Y.Map, Y.Array)
Step-by-Step Guide
1. Choose conflict resolution strategy
Decide between OT (Operational Transformation) and CRDT (Conflict-free Replicated Data Type). CRDTs converge without a central server; OT requires server coordination but has a simpler mental model for centralized architectures. [src3]
Verify: Match your requirements against the decision tree above. If offline or P2P is needed, CRDT is the only viable option.
2. Set up the CRDT document model
Initialize Yjs shared types that mirror your document structure. Yjs provides Y.Text, Y.Map, Y.Array, and Y.XmlFragment. [src4]
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
import { IndexeddbPersistence } from 'y-indexeddb';
const ydoc = new Y.Doc();
const ytext = ydoc.getText('document-body');
const ymeta = ydoc.getMap('document-meta');
ymeta.set('title', 'Untitled Document');
ymeta.set('lastModified', Date.now());
Verify: console.log(ydoc.getText('document-body').toString()) --> expected: ""
3. Connect the WebSocket sync provider
Wire up the network transport layer. The y-websocket provider handles connection management, reconnection, and binary sync protocol automatically. [src4]
const wsProvider = new WebsocketProvider(
'wss://your-server.example.com',
'document-room-id',
ydoc
);
wsProvider.on('status', ({ status }) => {
console.log(`Sync status: ${status}`);
});
const indexeddbProvider = new IndexeddbPersistence('document-room-id', ydoc);
indexeddbProvider.on('synced', () => console.log('Loaded from local cache'));
Verify: Open two browser tabs with the same room ID. Type in one, confirm text appears in the other within 50-100ms.
4. Implement awareness (cursors and presence)
Awareness broadcasts ephemeral state (cursor position, user name, color) to all connected peers. Unlike document state, awareness data is not persisted. [src4]
wsProvider.awareness.setLocalStateField('user', {
name: 'Alice', color: '#30bced', cursor: null
});
wsProvider.awareness.on('change', ({ added, updated, removed }) => {
const states = wsProvider.awareness.getStates();
states.forEach((state, clientId) => {
if (clientId !== ydoc.clientID) {
console.log(`User ${state.user?.name} at ${state.user?.cursor}`);
}
});
});
Verify: Open two tabs, confirm each shows the other user's cursor position and name.
5. Bind to a rich-text editor
Connect Yjs to TipTap (built on ProseMirror) with collaboration extensions. Disable the built-in history plugin -- Yjs handles undo/redo. [src4]
import { Editor } from '@tiptap/core';
import StarterKit from '@tiptap/starter-kit';
import Collaboration from '@tiptap/extension-collaboration';
import CollaborationCursor from '@tiptap/extension-collaboration-cursor';
const editor = new Editor({
extensions: [
StarterKit.configure({ history: false }),
Collaboration.configure({ document: ydoc }),
CollaborationCursor.configure({
provider: wsProvider,
user: { name: 'Alice', color: '#30bced' },
}),
],
});
Verify: Type formatted text (bold, italic) in one tab. Confirm formatting appears correctly in the other tab.
6. Deploy the WebSocket server
Run the y-websocket server in production with persistence enabled. [src4]
npm install y-websocket
HOST=0.0.0.0 PORT=1234 YPERSISTENCE=./yjs-docs npx y-websocket
Verify: curl -s -o /dev/null -w "%{http_code}" http://localhost:1234/
7. Add server-side persistence and snapshots
Store periodic full-document snapshots for fast loading and disaster recovery. [src1]
const { LeveldbPersistence } = require('y-leveldb');
const persistence = new LeveldbPersistence('./yjs-storage');
setInterval(async () => {
const docNames = await persistence.getAllDocNames();
for (const docName of docNames) {
const ydoc = await persistence.getYDoc(docName);
const snapshot = Y.encodeStateAsUpdate(ydoc);
await saveSnapshotToS3(docName, Buffer.from(snapshot));
ydoc.destroy();
}
}, 5 * 60 * 1000);
Verify: Stop and restart the server. Reopen the document and confirm all content is preserved.
Code Examples
JavaScript: Minimal CRDT Collaboration with Yjs
// Input: Two users editing the same Y.Text concurrently
// Output: Both edits merged without conflict, converged state
import * as Y from 'yjs'; // [email protected]
const doc1 = new Y.Doc();
const doc2 = new Y.Doc();
doc1.getText('shared').insert(0, 'Hello ');
doc2.getText('shared').insert(0, 'World');
const sv1 = Y.encodeStateVector(doc1);
const sv2 = Y.encodeStateVector(doc2);
const update1to2 = Y.encodeStateAsUpdate(doc1, sv2);
const update2to1 = Y.encodeStateAsUpdate(doc2, sv1);
Y.applyUpdate(doc1, update2to1);
Y.applyUpdate(doc2, update1to2);
console.log(doc1.getText('shared').toString() === doc2.getText('shared').toString()); // true
JavaScript: WebSocket Sync Server (Node.js)
// Input: Multiple WebSocket clients connecting per document room
// Output: Real-time sync of Y.Doc state across all connected clients
const http = require('http');
const WebSocket = require('ws');
const { setupWSConnection } = require('y-websocket/bin/utils');
const server = http.createServer((req, res) => {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('y-websocket server');
});
const wss = new WebSocket.Server({ server });
wss.on('connection', (ws, req) => {
const roomName = req.url.slice(1).split('?')[0];
setupWSConnection(ws, req, { docName: roomName });
});
server.listen(1234);
Python: OT-style Server with Operation Logging
# Input: Concurrent text operations from multiple clients
# Output: Transformed operations that maintain document consistency
from dataclasses import dataclass
from typing import Literal
@dataclass
class TextOp:
op_type: Literal["insert", "delete"]
position: int
text: str = ""
length: int = 0
client_id: str = ""
revision: int = 0
def transform(op1: TextOp, op2: TextOp) -> TextOp:
"""Transform op1 against op2 (server-side OT)."""
if op1.op_type == "insert" and op2.op_type == "insert":
if op1.position > op2.position or (
op1.position == op2.position and op1.client_id > op2.client_id
):
return TextOp("insert", op1.position + len(op2.text),
op1.text, client_id=op1.client_id)
return op1
# ... additional cases for delete/insert, delete/delete combinations
return op1
op_alice = TextOp("insert", 5, "Hello", client_id="alice", revision=1)
op_bob = TextOp("insert", 3, "World", client_id="bob", revision=1)
op_alice_prime = transform(op_alice, op_bob)
print(f"Alice's op transformed: position {op_alice_prime.position}") # 10
Anti-Patterns
Wrong: Using simple last-write-wins for text content
// BAD -- overwrites entire document on each keystroke, losing concurrent edits
socket.on('document-update', (newContent) => {
document.content = newContent; // last write wins = data loss
});
Correct: Using granular CRDT operations
// GOOD -- merges character-level operations, preserves all concurrent edits
const ydoc = new Y.Doc();
const ytext = ydoc.getText('content');
ydoc.on('update', (update) => {
socket.emit('yjs-update', update); // send binary diff, not full document
});
socket.on('yjs-update', (update) => {
Y.applyUpdate(ydoc, new Uint8Array(update)); // merge, not overwrite
});
Wrong: Sending full document state on every edit
// BAD -- O(document_size) per keystroke, 100KB+ per keystroke
editor.on('change', () => {
const fullDoc = editor.getContent();
ws.send(JSON.stringify({ type: 'full-sync', data: fullDoc }));
});
Correct: Sending incremental updates only
// GOOD -- O(edit_size) per keystroke, typically 10-100 bytes
ydoc.on('update', (update, origin) => {
if (origin !== 'remote') {
ws.send(update); // binary diff: only the characters that changed
}
});
Wrong: Using built-in editor undo with collaborative editing
// BAD -- built-in undo reverts OTHER users' changes, not just yours
const editor = new Editor({
extensions: [StarterKit, Collaboration.configure({ document: ydoc })]
});
// User A types, User B types, User A hits Ctrl+Z -> undoes User B's edit!
Correct: Using Yjs undo manager scoped to local client
// GOOD -- undo only affects the local user's operations
import { UndoManager } from 'yjs';
const undoManager = new UndoManager(ytext, {
trackedOrigins: new Set([null]) // only track local changes
});
const editor = new Editor({
extensions: [
StarterKit.configure({ history: false }), // DISABLE built-in undo
Collaboration.configure({ document: ydoc }),
],
});
Wrong: No reconnection handling
// BAD -- connection drops silently, user loses subsequent edits
const ws = new WebSocket('wss://server.example.com/doc/123');
ws.onclose = () => console.log('disconnected'); // logs and does nothing
Correct: Exponential backoff with offline queue
// GOOD -- queues edits locally and syncs on reconnection
function connectWithRetry(url, ydoc, attempt = 0) {
const ws = new WebSocket(url);
ws.onopen = () => {
attempt = 0;
const localState = Y.encodeStateAsUpdate(ydoc);
ws.send(localState);
};
ws.onclose = () => {
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
setTimeout(() => connectWithRetry(url, ydoc, attempt + 1), delay);
};
}
Common Pitfalls
- CRDT memory bloat: Yjs stores tombstones for deleted characters. A document with 100K edits can consume 10-50x its visible text size in memory. Fix: enable
ydoc.gc = trueand periodically re-encode withY.encodeStateAsUpdate(ydoc). [src2] - Cursor jumps on high latency: When sync latency exceeds 200ms, cursor positions desync and jump. Fix: throttle awareness updates to 50-100ms intervals and use
Y.createRelativePositionFromTypeIndex. [src4] - Split-brain from missing server authority: In pure P2P mode, network partitions can diverge and produce surprising merges. Fix: designate one peer as authority or accept non-deterministic merge ordering. [src2]
- Rich-text formatting conflicts: Overlapping bold/italic ranges from two users can produce incorrect formatting. Fix: use Peritext-style formatting marks attached to character positions. [src7]
- WebSocket message ordering: Load-balancing across multiple servers can deliver messages out of order. Fix: use sticky sessions (route by document ID) or include vector clocks. [src1]
- Snapshot load time: Replaying all operations from genesis is O(total_ops). Fix: store periodic snapshots and replay only operations after the last snapshot. [src1]
- Browser tab duplication: Each tab creates a separate Y.Doc and WebSocket connection, causing double presence. Fix: use BroadcastChannel API or SharedWorker. [src4]
- Undo across formatting boundaries: Undoing a bold operation modified by another user can corrupt formatting. Fix: use Yjs UndoManager with
captureTransaction. [src7]
Diagnostic Commands
# Check WebSocket server health
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: test" \
http://localhost:1234/
# Monitor active WebSocket connections (Linux)
ss -s | grep -i estab
# Check Yjs document size in bytes (Node.js)
node -e "const Y=require('yjs'); const doc=new Y.Doc(); console.log(Y.encodeStateAsUpdate(doc).byteLength)"
# Monitor WebSocket traffic in Chrome DevTools
# Network tab -> WS filter -> click connection -> Messages tab
Version History & Compatibility
| Library | Version | Status | Key Changes | Notes |
|---|---|---|---|---|
| Yjs | 13.6.x | Current (2024) | Improved GC, Sub-documents | Recommended for new projects |
| Yjs | 13.5.x | Stable | Y.XmlFragment improvements | |
| Automerge | 2.x | Current (2023) | Rust core, 10x faster than 1.x | Breaking API changes from 1.x |
| Automerge | 1.x | Legacy | Pure JS, slow on large docs | Migrate to 2.x |
| ShareDB | 4.x | Current | OT with JSON documents | Mature but less active |
| y-websocket | 2.x | Current | Binary protocol, awareness | Drop-in server for Yjs |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Multiple users edit the same document simultaneously | Single-user editing with simple save/load | Standard REST API with optimistic locking |
| Offline editing with later sync is required | Always-online with reliable network | Server-authoritative OT (ShareDB) |
| Rich-text collaborative editing (Google Docs clone) | Chat or messaging (append-only) | WebSocket pub/sub (Socket.IO rooms) |
| Peer-to-peer collaboration without central server | You already have and require a central server | OT with server transformation |
| Canvas/whiteboard with object-level edits (Figma clone) | Simple form data with field-level edits | Property-level last-write-wins |
| Need to support 100+ concurrent editors per document | 2-3 users with low-frequency edits | Mutex lock or turn-based editing |
Important Caveats
- CRDT performance has improved dramatically (Yjs processes 6M ops/sec in benchmarks) but memory overhead remains: expect 2-5x document size for metadata in Yjs, more for Automerge 1.x
- Google Docs uses OT, not CRDTs -- but Google has a massive engineering team maintaining their custom OT implementation. For most teams, CRDTs (Yjs) are more practical to implement correctly
- Figma does NOT use "true CRDTs" -- they use CRDT-inspired property-level last-write-wins with a central server, which is simpler but does not support offline editing or P2P
- The "OT vs CRDT" debate is largely settled for new projects: use CRDTs unless you have specific reasons not to. Joseph Gentle (ex-Google Wave): "Everything OT can do, CRDTs can do. The reverse is not true."
- Rich-text CRDT is an unsolved problem at the academic level -- Peritext (2022) is the most promising approach but not yet a drop-in solution in mainstream libraries
- End-to-end encryption is possible with CRDTs (server never sees plaintext) but not with OT (server must understand operations to transform them)