Real-Time Collaboration System Design (Google Docs Clone)

Type: Software Reference Confidence: 0.90 Sources: 7 Verified: 2026-02-23 Freshness: 2026-02-23

TL;DR

Constraints

Quick Reference

ComponentRoleTechnology OptionsScaling Strategy
Conflict Resolution EngineMerges concurrent edits without data lossYjs (CRDT), Automerge (CRDT), ShareDB (OT), Google OTStateless -- runs on each client + server
WebSocket GatewayPersistent bidirectional connection for real-time syncy-websocket, Socket.IO, ws (Node), Gorilla (Go)Horizontal scale with sticky sessions per document
Document RouterRoutes clients to correct server for their documentConsistent hashing on document IDHash ring with virtual nodes
Presence ServiceTracks who is online, cursor positions, selectionsLiveblocks Presence, Phoenix Presence, custom broadcastEphemeral state -- no persistence needed
Awareness ProtocolBroadcasts cursor/selection state to peersYjs Awareness, custom WebSocket messagesPiggyback on sync connection; throttle to 10-15 fps
Operation Log / WALOrdered history of all edits for replay and recoveryAppend-only log (Kafka, Redis Streams, PostgreSQL)Partition by document ID
Snapshot StorePeriodic full-document snapshots for fast loadS3, PostgreSQL JSONB, RedisSnapshot every N ops or T seconds
Auth + Access ControlPer-document permissions (view/edit/comment)JWT tokens validated at WebSocket handshakeStateless token validation at gateway
Offline QueueBuffers edits when disconnected, replays on reconnectIndexedDB (browser), SQLite (mobile)Local-first -- syncs on reconnection
Rich-Text Formatting LayerHandles bold, italic, links without merge conflictsPeritext (CRDT), ProseMirror + Yjs, TipTap + YjsIntegrated with conflict resolution engine
Garbage CollectorCompacts tombstones and reclaims memoryYjs GC, Automerge compaction, custom sweepRun during low-activity periods
CDN / Edge CacheServes static assets; caches read-only snapshotsCloudflare, CloudFront, FastlyCache invalidation on document update

Decision Tree

START
|-- Need offline editing + peer-to-peer sync?
|   |-- YES --> Use CRDT (Yjs or Automerge)
|   +-- NO
|       |-- Need rich-text collaborative editing?
|       |   |-- YES
|       |   |   |-- Want managed solution?
|       |   |   |   |-- YES --> Liveblocks or TipTap Cloud
|       |   |   |   +-- NO --> Yjs + TipTap/ProseMirror + y-websocket
|       |   +-- NO (plain text or structured data)
|       |       |-- Central server acceptable?
|       |       |   |-- YES
|       |       |   |   |-- < 1K concurrent users per doc?
|       |       |   |   |   |-- YES --> OT (ShareDB) or Yjs -- both work
|       |       |   |   |   +-- NO --> Yjs (better memory profile at scale)
|       |       |   +-- NO --> Yjs with y-webrtc provider
|       +-- Canvas/graphics editing?
|           |-- YES --> Property-level LWW (Figma model) [src1]
|           +-- NO --> Yjs with appropriate shared type (Y.Map, Y.Array)

Step-by-Step Guide

1. Choose conflict resolution strategy

Decide between OT (Operational Transformation) and CRDT (Conflict-free Replicated Data Type). CRDTs converge without a central server; OT requires server coordination but has a simpler mental model for centralized architectures. [src3]

Verify: Match your requirements against the decision tree above. If offline or P2P is needed, CRDT is the only viable option.

2. Set up the CRDT document model

Initialize Yjs shared types that mirror your document structure. Yjs provides Y.Text, Y.Map, Y.Array, and Y.XmlFragment. [src4]

import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';
import { IndexeddbPersistence } from 'y-indexeddb';

const ydoc = new Y.Doc();
const ytext = ydoc.getText('document-body');
const ymeta = ydoc.getMap('document-meta');
ymeta.set('title', 'Untitled Document');
ymeta.set('lastModified', Date.now());

Verify: console.log(ydoc.getText('document-body').toString()) --> expected: ""

3. Connect the WebSocket sync provider

Wire up the network transport layer. The y-websocket provider handles connection management, reconnection, and binary sync protocol automatically. [src4]

const wsProvider = new WebsocketProvider(
  'wss://your-server.example.com',
  'document-room-id',
  ydoc
);
wsProvider.on('status', ({ status }) => {
  console.log(`Sync status: ${status}`);
});
const indexeddbProvider = new IndexeddbPersistence('document-room-id', ydoc);
indexeddbProvider.on('synced', () => console.log('Loaded from local cache'));

Verify: Open two browser tabs with the same room ID. Type in one, confirm text appears in the other within 50-100ms.

4. Implement awareness (cursors and presence)

Awareness broadcasts ephemeral state (cursor position, user name, color) to all connected peers. Unlike document state, awareness data is not persisted. [src4]

wsProvider.awareness.setLocalStateField('user', {
  name: 'Alice', color: '#30bced', cursor: null
});
wsProvider.awareness.on('change', ({ added, updated, removed }) => {
  const states = wsProvider.awareness.getStates();
  states.forEach((state, clientId) => {
    if (clientId !== ydoc.clientID) {
      console.log(`User ${state.user?.name} at ${state.user?.cursor}`);
    }
  });
});

Verify: Open two tabs, confirm each shows the other user's cursor position and name.

5. Bind to a rich-text editor

Connect Yjs to TipTap (built on ProseMirror) with collaboration extensions. Disable the built-in history plugin -- Yjs handles undo/redo. [src4]

import { Editor } from '@tiptap/core';
import StarterKit from '@tiptap/starter-kit';
import Collaboration from '@tiptap/extension-collaboration';
import CollaborationCursor from '@tiptap/extension-collaboration-cursor';

const editor = new Editor({
  extensions: [
    StarterKit.configure({ history: false }),
    Collaboration.configure({ document: ydoc }),
    CollaborationCursor.configure({
      provider: wsProvider,
      user: { name: 'Alice', color: '#30bced' },
    }),
  ],
});

Verify: Type formatted text (bold, italic) in one tab. Confirm formatting appears correctly in the other tab.

6. Deploy the WebSocket server

Run the y-websocket server in production with persistence enabled. [src4]

npm install y-websocket
HOST=0.0.0.0 PORT=1234 YPERSISTENCE=./yjs-docs npx y-websocket

Verify: curl -s -o /dev/null -w "%{http_code}" http://localhost:1234/

7. Add server-side persistence and snapshots

Store periodic full-document snapshots for fast loading and disaster recovery. [src1]

const { LeveldbPersistence } = require('y-leveldb');
const persistence = new LeveldbPersistence('./yjs-storage');

setInterval(async () => {
  const docNames = await persistence.getAllDocNames();
  for (const docName of docNames) {
    const ydoc = await persistence.getYDoc(docName);
    const snapshot = Y.encodeStateAsUpdate(ydoc);
    await saveSnapshotToS3(docName, Buffer.from(snapshot));
    ydoc.destroy();
  }
}, 5 * 60 * 1000);

Verify: Stop and restart the server. Reopen the document and confirm all content is preserved.

Code Examples

JavaScript: Minimal CRDT Collaboration with Yjs

// Input:  Two users editing the same Y.Text concurrently
// Output: Both edits merged without conflict, converged state

import * as Y from 'yjs'; // [email protected]

const doc1 = new Y.Doc();
const doc2 = new Y.Doc();

doc1.getText('shared').insert(0, 'Hello ');
doc2.getText('shared').insert(0, 'World');

const sv1 = Y.encodeStateVector(doc1);
const sv2 = Y.encodeStateVector(doc2);
const update1to2 = Y.encodeStateAsUpdate(doc1, sv2);
const update2to1 = Y.encodeStateAsUpdate(doc2, sv1);
Y.applyUpdate(doc1, update2to1);
Y.applyUpdate(doc2, update1to2);

console.log(doc1.getText('shared').toString() === doc2.getText('shared').toString()); // true

JavaScript: WebSocket Sync Server (Node.js)

// Input:  Multiple WebSocket clients connecting per document room
// Output: Real-time sync of Y.Doc state across all connected clients

const http = require('http');
const WebSocket = require('ws');
const { setupWSConnection } = require('y-websocket/bin/utils');

const server = http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'text/plain' });
  res.end('y-websocket server');
});
const wss = new WebSocket.Server({ server });
wss.on('connection', (ws, req) => {
  const roomName = req.url.slice(1).split('?')[0];
  setupWSConnection(ws, req, { docName: roomName });
});
server.listen(1234);

Python: OT-style Server with Operation Logging

# Input:  Concurrent text operations from multiple clients
# Output: Transformed operations that maintain document consistency

from dataclasses import dataclass
from typing import Literal

@dataclass
class TextOp:
    op_type: Literal["insert", "delete"]
    position: int
    text: str = ""
    length: int = 0
    client_id: str = ""
    revision: int = 0

def transform(op1: TextOp, op2: TextOp) -> TextOp:
    """Transform op1 against op2 (server-side OT)."""
    if op1.op_type == "insert" and op2.op_type == "insert":
        if op1.position > op2.position or (
            op1.position == op2.position and op1.client_id > op2.client_id
        ):
            return TextOp("insert", op1.position + len(op2.text),
                          op1.text, client_id=op1.client_id)
        return op1
    # ... additional cases for delete/insert, delete/delete combinations
    return op1

op_alice = TextOp("insert", 5, "Hello", client_id="alice", revision=1)
op_bob = TextOp("insert", 3, "World", client_id="bob", revision=1)
op_alice_prime = transform(op_alice, op_bob)
print(f"Alice's op transformed: position {op_alice_prime.position}")  # 10

Anti-Patterns

Wrong: Using simple last-write-wins for text content

// BAD -- overwrites entire document on each keystroke, losing concurrent edits
socket.on('document-update', (newContent) => {
  document.content = newContent; // last write wins = data loss
});

Correct: Using granular CRDT operations

// GOOD -- merges character-level operations, preserves all concurrent edits
const ydoc = new Y.Doc();
const ytext = ydoc.getText('content');
ydoc.on('update', (update) => {
  socket.emit('yjs-update', update); // send binary diff, not full document
});
socket.on('yjs-update', (update) => {
  Y.applyUpdate(ydoc, new Uint8Array(update)); // merge, not overwrite
});

Wrong: Sending full document state on every edit

// BAD -- O(document_size) per keystroke, 100KB+ per keystroke
editor.on('change', () => {
  const fullDoc = editor.getContent();
  ws.send(JSON.stringify({ type: 'full-sync', data: fullDoc }));
});

Correct: Sending incremental updates only

// GOOD -- O(edit_size) per keystroke, typically 10-100 bytes
ydoc.on('update', (update, origin) => {
  if (origin !== 'remote') {
    ws.send(update); // binary diff: only the characters that changed
  }
});

Wrong: Using built-in editor undo with collaborative editing

// BAD -- built-in undo reverts OTHER users' changes, not just yours
const editor = new Editor({
  extensions: [StarterKit, Collaboration.configure({ document: ydoc })]
});
// User A types, User B types, User A hits Ctrl+Z -> undoes User B's edit!

Correct: Using Yjs undo manager scoped to local client

// GOOD -- undo only affects the local user's operations
import { UndoManager } from 'yjs';
const undoManager = new UndoManager(ytext, {
  trackedOrigins: new Set([null]) // only track local changes
});
const editor = new Editor({
  extensions: [
    StarterKit.configure({ history: false }), // DISABLE built-in undo
    Collaboration.configure({ document: ydoc }),
  ],
});

Wrong: No reconnection handling

// BAD -- connection drops silently, user loses subsequent edits
const ws = new WebSocket('wss://server.example.com/doc/123');
ws.onclose = () => console.log('disconnected'); // logs and does nothing

Correct: Exponential backoff with offline queue

// GOOD -- queues edits locally and syncs on reconnection
function connectWithRetry(url, ydoc, attempt = 0) {
  const ws = new WebSocket(url);
  ws.onopen = () => {
    attempt = 0;
    const localState = Y.encodeStateAsUpdate(ydoc);
    ws.send(localState);
  };
  ws.onclose = () => {
    const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
    setTimeout(() => connectWithRetry(url, ydoc, attempt + 1), delay);
  };
}

Common Pitfalls

Diagnostic Commands

# Check WebSocket server health
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: test" \
  http://localhost:1234/

# Monitor active WebSocket connections (Linux)
ss -s | grep -i estab

# Check Yjs document size in bytes (Node.js)
node -e "const Y=require('yjs'); const doc=new Y.Doc(); console.log(Y.encodeStateAsUpdate(doc).byteLength)"

# Monitor WebSocket traffic in Chrome DevTools
# Network tab -> WS filter -> click connection -> Messages tab

Version History & Compatibility

LibraryVersionStatusKey ChangesNotes
Yjs13.6.xCurrent (2024)Improved GC, Sub-documentsRecommended for new projects
Yjs13.5.xStableY.XmlFragment improvements
Automerge2.xCurrent (2023)Rust core, 10x faster than 1.xBreaking API changes from 1.x
Automerge1.xLegacyPure JS, slow on large docsMigrate to 2.x
ShareDB4.xCurrentOT with JSON documentsMature but less active
y-websocket2.xCurrentBinary protocol, awarenessDrop-in server for Yjs

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Multiple users edit the same document simultaneouslySingle-user editing with simple save/loadStandard REST API with optimistic locking
Offline editing with later sync is requiredAlways-online with reliable networkServer-authoritative OT (ShareDB)
Rich-text collaborative editing (Google Docs clone)Chat or messaging (append-only)WebSocket pub/sub (Socket.IO rooms)
Peer-to-peer collaboration without central serverYou already have and require a central serverOT with server transformation
Canvas/whiteboard with object-level edits (Figma clone)Simple form data with field-level editsProperty-level last-write-wins
Need to support 100+ concurrent editors per document2-3 users with low-frequency editsMutex lock or turn-based editing

Important Caveats

Related Units