design cloud file storage

- Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.

build a Dropbox clone architecture

- Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.

S3-like object storage system design

- Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.

distributed file storage architecture

- Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.

cloud storage system design interview

- Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.

Designing a File Storage System (Dropbox/S3 Clone)

How do I design a file storage system (Dropbox/S3 clone)?

TL;DR

Bottom line: A file storage system separates metadata (SQL/NoSQL DB) from blob storage (object store), uses chunking for large files, and requires careful choices around consistency, deduplication, and durability.
Key tool/command: aws s3api put-object or presigned URLs for direct client-to-storage uploads bypassing your servers.
Watch out for: Storing file content in your database -- always separate metadata from blob storage.
Works with: Any cloud (AWS S3, GCS, Azure Blob) or self-hosted (MinIO, Ceph, SeaweedFS); language-agnostic architecture.

Constraints

Never store file content and metadata in the same database -- metadata needs low-latency queries, blobs need throughput-optimized storage
Always encrypt files at rest (AES-256) and in transit (TLS 1.2+) -- regulatory requirement for any production file storage
Chunk size selection is irreversible at scale -- migrating billions of existing chunks to a new size requires a full rewrite pipeline
Presigned URLs must have short TTLs (5-15 min for uploads, 1-60 min for downloads) -- leaked URLs grant unauthenticated access
Replication factor and erasure coding scheme determine both durability and storage cost -- changing after deployment requires data migration

Quick Reference

Component	Role	Technology Options	Scaling Strategy
API Gateway	Route requests, auth, rate limiting	Nginx, Kong, AWS API Gateway	Horizontal + auto-scaling
Metadata Service	File/folder CRUD, permissions, versions	Custom service (Go/Java)	Horizontal with DB sharding
Metadata Database	Store file metadata, hierarchy, ACLs	PostgreSQL, MySQL, DynamoDB, CockroachDB	Shard by user_id or org_id
Blob Storage	Persist raw file chunks	AWS S3, GCS, MinIO, Ceph, SeaweedFS	Virtually unlimited (managed) or add nodes (self-hosted)
Chunk Service	Split files into chunks, compute hashes	Custom service (Go/Rust)	Horizontal, stateless workers
Deduplication Index	Map content hashes to stored chunks	Redis, DynamoDB, Cassandra	Partition by hash prefix
Upload/Download Service	Presigned URL generation, multipart orchestration	Custom + cloud SDK	Horizontal, stateless
Sync Service	Detect changes, push notifications to devices	WebSocket server + message queue	Horizontal + connection affinity
Message Queue	Async processing (thumbnails, virus scan, indexing)	Kafka, SQS, RabbitMQ	Partition by file_id or user_id
Notification Service	Push file change events to clients	WebSocket, SSE, long polling	Horizontal with pub/sub backend
CDN	Cache frequently accessed files at edge	CloudFront, Cloudflare, Fastly	Global PoPs, cache by object key
Thumbnail/Preview Service	Generate previews for images, docs, video	Lambda/Cloud Functions, dedicated workers	Horizontal, event-driven
Virus Scanner	Scan uploads before making files available	ClamAV, cloud-based (S3 Malware Protection)	Queue-based, horizontal workers
Encryption Service	Key management, envelope encryption	AWS KMS, HashiCorp Vault	Managed service or HA cluster

Decision Tree

START
|-- Need real-time sync across devices?
|   |-- YES --> Full Dropbox-like architecture (sync service + notification service + chunking)
|   +-- NO --> Simpler S3-like store-and-retrieve architecture
|
|-- Storage semantics?
|   |-- Object (flat, immutable) --> Object storage (S3-compatible API, content-addressable)
|   |-- File (hierarchical, mutable) --> File storage with metadata DB + blob backend
|   +-- Block (raw volumes) --> Block storage (EBS-like, out of scope for this unit)
|
|-- Expected file sizes?
|   |-- Mostly <10MB --> No chunking needed, direct upload to blob store
|   |-- Mixed (10MB-5GB) --> Fixed-size chunking (4-8MB chunks), multipart upload
|   +-- Large files (>5GB) --> Content-defined chunking for dedup + parallel upload
|
|-- Durability requirement?
|   |-- 99.999999999% (11 nines, S3-class) --> Erasure coding (Reed-Solomon 10+4) across AZs
|   |-- 99.9999% (6 nines) --> 3x replication across data centers
|   +-- 99.99% (4 nines) --> 2x replication, single region
|
|-- Deduplication priority?
|   |-- High (storage cost critical) --> Content-defined chunking + per-chunk dedup index
|   |-- Moderate --> Fixed-size chunking + file-level dedup (hash full file)
|   +-- Low (simplicity first) --> No dedup, store every upload as-is

Step-by-Step Guide

1. Define the data model for metadata

Design your metadata schema to support files, folders, versions, and sharing. Keep metadata separate from blob storage. [src3]

-- Core metadata tables
CREATE TABLE users (
    user_id     UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email       VARCHAR(255) UNIQUE NOT NULL,
    storage_quota_bytes BIGINT DEFAULT 10737418240,  -- 10GB default
    storage_used_bytes  BIGINT DEFAULT 0,
    created_at  TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE files (
    file_id     UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id     UUID NOT NULL REFERENCES users(user_id),
    parent_id   UUID REFERENCES files(file_id),  -- NULL = root
    name        VARCHAR(1024) NOT NULL,
    is_folder   BOOLEAN DEFAULT FALSE,
    mime_type   VARCHAR(255),
    size_bytes  BIGINT,
    checksum    VARCHAR(64),  -- SHA-256 of full file
    version     INTEGER DEFAULT 1,
    status      VARCHAR(20) DEFAULT 'active',
    created_at  TIMESTAMPTZ DEFAULT now(),
    updated_at  TIMESTAMPTZ DEFAULT now(),
    UNIQUE (user_id, parent_id, name)
);

CREATE TABLE file_chunks (
    chunk_id    VARCHAR(64) PRIMARY KEY,  -- SHA-256 of chunk content
    size_bytes  INTEGER NOT NULL,
    ref_count   INTEGER DEFAULT 1,
    storage_key VARCHAR(512) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE file_chunk_map (
    file_id     UUID REFERENCES files(file_id),
    chunk_id    VARCHAR(64) REFERENCES file_chunks(chunk_id),
    chunk_index INTEGER NOT NULL,
    PRIMARY KEY (file_id, chunk_index)
);

Verify: SELECT count(*) FROM information_schema.tables WHERE table_name IN ('users','files','file_chunks','file_chunk_map'); → expected: 4

2. Implement chunked file upload

Split files into chunks on the client side, compute SHA-256 hashes, and upload only new chunks. This enables deduplication and resumable uploads. [src1]

# chunk_upload.py -- Client-side chunked upload with dedup
import hashlib
import requests

CHUNK_SIZE = 4 * 1024 * 1024  # 4MB chunks
API_BASE = "https://api.filestorage.example.com"

def upload_file(filepath: str, token: str) -> dict:
    """Upload a file in chunks with deduplication."""
    chunks = []
    headers = {"Authorization": f"Bearer {token}"}

    with open(filepath, "rb") as f:
        index = 0
        while True:
            data = f.read(CHUNK_SIZE)
            if not data:
                break
            chunk_hash = hashlib.sha256(data).hexdigest()
            chunks.append({"index": index, "hash": chunk_hash, "size": len(data)})

            # Check if chunk already exists (dedup)
            check = requests.head(
                f"{API_BASE}/chunks/{chunk_hash}",
                headers=headers
            )
            if check.status_code == 404:
                # Get presigned upload URL
                resp = requests.post(
                    f"{API_BASE}/chunks/upload-url",
                    json={"chunk_hash": chunk_hash, "size": len(data)},
                    headers=headers
                )
                presigned_url = resp.json()["upload_url"]
                # Upload directly to blob store
                requests.put(presigned_url, data=data)

            index += 1

    # Register the file with its chunk manifest
    file_hash = hashlib.sha256(
        "".join(c["hash"] for c in chunks).encode()
    ).hexdigest()
    resp = requests.post(
        f"{API_BASE}/files",
        json={
            "name": filepath.split("/")[-1],
            "checksum": file_hash,
            "chunks": chunks,
        },
        headers=headers,
    )
    return resp.json()

Verify: Upload a 20MB file → should produce 5 chunks of 4MB each, server returns 201 with file_id

3. Generate presigned URLs for direct client-to-storage transfer

Use presigned URLs so clients upload/download directly to/from blob storage, bypassing your servers. This is critical for scalability. [src4] [src5]

// presigned-urls.js -- Generate S3 presigned URLs (AWS SDK v3)
import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";

const s3 = new S3Client({ region: "us-east-1" });
const BUCKET = "user-files-prod";

export async function getUploadUrl(chunkHash, sizeBytes) {
  const command = new PutObjectCommand({
    Bucket: BUCKET,
    Key: `chunks/${chunkHash}`,
    ContentLength: sizeBytes,
    ServerSideEncryption: "aws:kms",
  });
  return getSignedUrl(s3, command, { expiresIn: 600 }); // 10 min
}

export async function getDownloadUrl(fileKey, filename) {
  const command = new GetObjectCommand({
    Bucket: BUCKET,
    Key: fileKey,
    ResponseContentDisposition: `attachment; filename="${filename}"`,
  });
  return getSignedUrl(s3, command, { expiresIn: 3600 }); // 60 min
}

Verify: Call getUploadUrl("abc123", 4194304) → returns URL starting with https://user-files-prod.s3.us-east-1.amazonaws.com/...

4. Design the sync protocol for real-time file synchronization

For Dropbox-like sync, implement a notification system where clients maintain a cursor (version vector) and receive push updates on file changes. [src3]

Client Sync Protocol:
1. Client connects via WebSocket to /sync endpoint
2. Client sends its last known cursor (monotonic version number)
3. Server streams all changes since that cursor:
   {action: "update", file_id: "...", version: 42, chunks: [...]}
   {action: "delete", file_id: "...", version: 43}
4. Client applies changes locally, updates cursor
5. Server pushes new changes in real-time as they occur
6. On conflict: last-writer-wins + save conflicted copy

Verify: Two clients connected → modify file on Client A → Client B receives change event within 2 seconds

5. Configure durability with replication or erasure coding

Choose between replication (simpler, faster reads) and erasure coding (cheaper storage, higher durability) based on your scale and budget. [src6]

Durability Comparison:
                          3x Replication    Reed-Solomon (10,4)
Storage overhead:         200%              40%
Read latency:             Lowest            Slightly higher (decode)
Write latency:            Low               Moderate (encode)
Repair bandwidth:         Copy 1 replica    Read 10 chunks to reconstruct
Durability (3 AZs):      ~99.9999%         ~99.999999999%
Min nodes to survive:     1 of 3            10 of 14
Best for:                 Hot data           Warm/cold data

Verify: Calculate: with 1PB raw data, 3x replication = 3PB total; erasure coding (10,4) = 1.4PB total → 53% storage savings

6. Implement content-defined chunking for deduplication

For systems that prioritize storage efficiency, use content-defined chunking (CDC) instead of fixed-size chunks. CDC produces stable chunk boundaries even when bytes are inserted or deleted. [src7]

# cdc_chunking.py -- Content-Defined Chunking using Rabin fingerprint
import hashlib

WINDOW_SIZE = 48
MIN_CHUNK = 2 * 1024 * 1024    # 2MB minimum
MAX_CHUNK = 8 * 1024 * 1024    # 8MB maximum
TARGET_CHUNK = 4 * 1024 * 1024 # 4MB target
MASK = (1 << 22) - 1           # ~4MB average with 22-bit mask

def chunk_file_cdc(filepath: str) -> list[dict]:
    """Split file using content-defined chunking."""
    chunks = []
    with open(filepath, "rb") as f:
        buf = bytearray()
        fingerprint = 0

        while True:
            byte = f.read(1)
            if not byte:
                if buf:
                    h = hashlib.sha256(bytes(buf)).hexdigest()
                    chunks.append({"hash": h, "size": len(buf)})
                break

            buf.append(byte[0])
            if len(buf) < MIN_CHUNK:
                continue

            fingerprint = ((fingerprint << 1) + byte[0]) & 0xFFFFFFFF

            if (fingerprint & MASK) == 0 or len(buf) >= MAX_CHUNK:
                h = hashlib.sha256(bytes(buf)).hexdigest()
                chunks.append({"hash": h, "size": len(buf)})
                buf = bytearray()
                fingerprint = 0

    return chunks

Verify: Chunk a 50MB file → expect 10-15 chunks; modify 1KB in the middle → only 1-2 chunks change

Code Examples

Python: Chunked Upload with Retry Logic

# Input:  filepath (str), api_base (str), token (str)
# Output: dict with file_id and upload status

import hashlib
import requests
from requests.adapters import HTTPAdapter, Retry

CHUNK_SIZE = 4 * 1024 * 1024  # 4MB

session = requests.Session()
session.mount("https://", HTTPAdapter(
    max_retries=Retry(total=3, backoff_factor=0.5,
                      status_forcelist=[500, 502, 503])
))

def upload_chunked(filepath: str, api_base: str, token: str) -> dict:
    headers = {"Authorization": f"Bearer {token}"}
    chunks = []
    with open(filepath, "rb") as f:
        idx = 0
        while data := f.read(CHUNK_SIZE):
            chunk_hash = hashlib.sha256(data).hexdigest()
            if session.head(f"{api_base}/chunks/{chunk_hash}",
                           headers=headers).status_code == 404:
                url_resp = session.post(f"{api_base}/chunks/upload-url",
                    json={"hash": chunk_hash, "size": len(data)},
                    headers=headers)
                session.put(url_resp.json()["url"], data=data,
                           headers={"Content-Type": "application/octet-stream"})
            chunks.append({"index": idx, "hash": chunk_hash, "size": len(data)})
            idx += 1
    return session.post(f"{api_base}/files",
        json={"name": filepath.rsplit("/", 1)[-1], "chunks": chunks},
        headers=headers).json()

Node.js: S3 Presigned URL Generation

// Input:  objectKey (string), bucket (string)
// Output: { uploadUrl, downloadUrl } with presigned URLs

import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";

const s3 = new S3Client({ region: process.env.AWS_REGION || "us-east-1" });

export async function generatePresignedUrls(objectKey, bucket) {
  const uploadUrl = await getSignedUrl(s3,
    new PutObjectCommand({ Bucket: bucket, Key: objectKey,
      ServerSideEncryption: "aws:kms" }),
    { expiresIn: 600 }
  );
  const downloadUrl = await getSignedUrl(s3,
    new GetObjectCommand({ Bucket: bucket, Key: objectKey }),
    { expiresIn: 3600 }
  );
  return { uploadUrl, downloadUrl };
}

Anti-Patterns

Wrong: Streaming file content through your API servers

# BAD -- File content passes through your server, creating a bottleneck
@app.route("/upload", methods=["POST"])
def upload():
    file_data = request.files["file"].read()  # Entire file in server RAM
    s3.put_object(Bucket="files", Key=key, Body=file_data)
    return {"status": "ok"}

Correct: Use presigned URLs for direct client-to-storage transfer

# GOOD -- Server only generates URL; client uploads directly to S3
@app.route("/upload-url", methods=["POST"])
def get_upload_url():
    key = f"uploads/{uuid4()}/{request.json['filename']}"
    url = s3.generate_presigned_url("put_object",
        Params={"Bucket": "files", "Key": key}, ExpiresIn=600)
    return {"upload_url": url, "key": key}

Wrong: Storing file content in the database

# BAD -- Databases are not designed for large binary storage
cursor.execute(
    "INSERT INTO files (name, content) VALUES (%s, %s)",
    (filename, file_bytes)  # Blows up DB size, kills query performance
)

Correct: Store metadata in DB, content in object storage

# GOOD -- Separate concerns: metadata in DB, blobs in S3
s3.upload_fileobj(file_obj, "files-bucket", object_key)
cursor.execute(
    "INSERT INTO files (name, storage_key, size) VALUES (%s, %s, %s)",
    (filename, object_key, file_size)
)

Wrong: Uploading entire file on every edit

# BAD -- Re-uploading 500MB file for a 1KB change
def sync_file(filepath):
    with open(filepath, "rb") as f:
        s3.upload_fileobj(f, "bucket", filepath)  # Full re-upload every time

Correct: Upload only changed chunks (delta sync)

# GOOD -- Only upload chunks whose hashes changed
def sync_file(filepath, previous_chunks):
    current_chunks = chunk_file(filepath)
    for chunk in current_chunks:
        if chunk["hash"] not in {c["hash"] for c in previous_chunks}:
            upload_chunk(chunk)  # Only new/changed chunks
    update_manifest(filepath, current_chunks)

Wrong: Single-region storage without replication

# BAD -- One region failure = total data loss
s3.create_bucket(Bucket="files", CreateBucketConfiguration={
    "LocationConstraint": "us-east-1"
})
# No cross-region replication, no versioning

Correct: Cross-region replication with versioning

# GOOD -- Enable versioning + cross-region replication
s3.put_bucket_versioning(Bucket="files",
    VersioningConfiguration={"Status": "Enabled"})
s3.put_bucket_replication(Bucket="files", ReplicationConfiguration={
    "Role": "arn:aws:iam::role/replication",
    "Rules": [{"Status": "Enabled", "Destination": {
        "Bucket": "arn:aws:s3:::files-replica-eu"}}]
})

Common Pitfalls

No multipart upload for large files: Files >100MB uploaded as a single request will time out or fail on network interruption. Fix: Use multipart upload (S3 supports 5MB-5GB parts, up to 10,000 parts per upload). [src4]
Metadata hotspot on single DB: All users' file metadata in one database table leads to lock contention at scale. Fix: Shard metadata DB by user_id or org_id; use read replicas for listing operations. [src1]
No garbage collection for orphaned chunks: After file deletion, unreferenced chunks waste storage indefinitely. Fix: Implement reference counting on chunks; run periodic GC that deletes chunks with ref_count=0. [src3]
Ignoring eventual consistency in object stores: S3 now provides strong read-after-write consistency, but other stores may not. Fix: Implement a metadata-first approach -- only mark files as "available" after confirming upload completion. [src4]
Presigned URL leakage: Long-lived presigned URLs get cached in browser history and proxy logs. Fix: Keep upload URLs to 5-15 minutes; download URLs to 15-60 minutes. [src5]
No rate limiting on upload endpoints: Attackers can fill storage quotas or exhaust presigned URL generation. Fix: Rate limit upload-url generation per user and enforce storage quotas server-side. [src4]
Chunking without a manifest/recipe: Storing chunks without a file-to-chunk mapping makes reconstruction impossible. Fix: Always persist a file "recipe" (ordered list of chunk hashes) atomically with chunk uploads. [src7]
No checksum verification on download: Silently serving corrupted data erodes trust. Fix: Store SHA-256 checksums at both chunk and file level; verify on download. [src4]

Diagnostic Commands

# Check S3 bucket replication status
aws s3api get-bucket-replication --bucket my-files-bucket

# Verify multipart uploads in progress (orphaned uploads waste storage)
aws s3api list-multipart-uploads --bucket my-files-bucket

# Check MinIO cluster health (self-hosted)
mc admin info myminio

# Verify data integrity with checksums
aws s3api head-object --bucket my-files-bucket --key chunks/abc123 --checksum-mode ENABLED

# Monitor storage usage per prefix
aws s3api list-objects-v2 --bucket my-files-bucket --prefix chunks/ \
  --query "sum(Contents[].Size)" --output text

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Storing user-uploaded files (documents, images, videos)	Storing structured/relational data needing queries	Relational database (PostgreSQL, MySQL)
Need >99.999% durability for customer data	Need sub-millisecond latency for small key-value lookups	Redis, DynamoDB, or in-memory cache
Multi-device file sync (Dropbox-like)	Simple static asset serving (JS, CSS, images)	CDN with origin bucket
Files range from KB to multi-GB	All data is <1KB (metadata-sized)	Key-value store (DynamoDB, Redis)
Need deduplication across users/files	Write-once archival with no dedup needs	S3 Glacier / cold storage directly
Compliance requires versioning and audit trails	Ephemeral/temporary data (cache, sessions)	Redis with TTL or temporary S3 with lifecycle rules

Important Caveats

AWS S3 now provides strong read-after-write consistency (since Dec 2020), but other S3-compatible stores (MinIO, Ceph RADOS Gateway) may still be eventually consistent -- verify before assuming strong consistency.
Content-defined chunking (CDC) improves deduplication by 10-20% over fixed-size chunking, but adds CPU overhead and implementation complexity -- only worth it above ~100TB of stored data or when storage costs dominate.
Erasure coding reduces storage by ~53% vs 3x replication but increases read latency (must decode) and repair bandwidth -- use for warm/cold data, not hot paths.
Dropbox migrated from S3 to their own Magic Pocket system at exabyte scale to save costs -- for most systems under 1PB, managed object storage (S3, GCS) is the right choice.
Presigned URL security depends on AWS Signature Version 4 -- always use HTTPS, set minimal expiry, and consider S3 Access Points or VPC endpoints for internal traffic.