aws s3api put-object or presigned URLs for direct client-to-storage uploads bypassing your servers.| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway | Route requests, auth, rate limiting | Nginx, Kong, AWS API Gateway | Horizontal + auto-scaling |
| Metadata Service | File/folder CRUD, permissions, versions | Custom service (Go/Java) | Horizontal with DB sharding |
| Metadata Database | Store file metadata, hierarchy, ACLs | PostgreSQL, MySQL, DynamoDB, CockroachDB | Shard by user_id or org_id |
| Blob Storage | Persist raw file chunks | AWS S3, GCS, MinIO, Ceph, SeaweedFS | Virtually unlimited (managed) or add nodes (self-hosted) |
| Chunk Service | Split files into chunks, compute hashes | Custom service (Go/Rust) | Horizontal, stateless workers |
| Deduplication Index | Map content hashes to stored chunks | Redis, DynamoDB, Cassandra | Partition by hash prefix |
| Upload/Download Service | Presigned URL generation, multipart orchestration | Custom + cloud SDK | Horizontal, stateless |
| Sync Service | Detect changes, push notifications to devices | WebSocket server + message queue | Horizontal + connection affinity |
| Message Queue | Async processing (thumbnails, virus scan, indexing) | Kafka, SQS, RabbitMQ | Partition by file_id or user_id |
| Notification Service | Push file change events to clients | WebSocket, SSE, long polling | Horizontal with pub/sub backend |
| CDN | Cache frequently accessed files at edge | CloudFront, Cloudflare, Fastly | Global PoPs, cache by object key |
| Thumbnail/Preview Service | Generate previews for images, docs, video | Lambda/Cloud Functions, dedicated workers | Horizontal, event-driven |
| Virus Scanner | Scan uploads before making files available | ClamAV, cloud-based (S3 Malware Protection) | Queue-based, horizontal workers |
| Encryption Service | Key management, envelope encryption | AWS KMS, HashiCorp Vault | Managed service or HA cluster |
START
|-- Need real-time sync across devices?
| |-- YES --> Full Dropbox-like architecture (sync service + notification service + chunking)
| +-- NO --> Simpler S3-like store-and-retrieve architecture
|
|-- Storage semantics?
| |-- Object (flat, immutable) --> Object storage (S3-compatible API, content-addressable)
| |-- File (hierarchical, mutable) --> File storage with metadata DB + blob backend
| +-- Block (raw volumes) --> Block storage (EBS-like, out of scope for this unit)
|
|-- Expected file sizes?
| |-- Mostly <10MB --> No chunking needed, direct upload to blob store
| |-- Mixed (10MB-5GB) --> Fixed-size chunking (4-8MB chunks), multipart upload
| +-- Large files (>5GB) --> Content-defined chunking for dedup + parallel upload
|
|-- Durability requirement?
| |-- 99.999999999% (11 nines, S3-class) --> Erasure coding (Reed-Solomon 10+4) across AZs
| |-- 99.9999% (6 nines) --> 3x replication across data centers
| +-- 99.99% (4 nines) --> 2x replication, single region
|
|-- Deduplication priority?
| |-- High (storage cost critical) --> Content-defined chunking + per-chunk dedup index
| |-- Moderate --> Fixed-size chunking + file-level dedup (hash full file)
| +-- Low (simplicity first) --> No dedup, store every upload as-is
Design your metadata schema to support files, folders, versions, and sharing. Keep metadata separate from blob storage. [src3]
-- Core metadata tables
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
storage_quota_bytes BIGINT DEFAULT 10737418240, -- 10GB default
storage_used_bytes BIGINT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE files (
file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(user_id),
parent_id UUID REFERENCES files(file_id), -- NULL = root
name VARCHAR(1024) NOT NULL,
is_folder BOOLEAN DEFAULT FALSE,
mime_type VARCHAR(255),
size_bytes BIGINT,
checksum VARCHAR(64), -- SHA-256 of full file
version INTEGER DEFAULT 1,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (user_id, parent_id, name)
);
CREATE TABLE file_chunks (
chunk_id VARCHAR(64) PRIMARY KEY, -- SHA-256 of chunk content
size_bytes INTEGER NOT NULL,
ref_count INTEGER DEFAULT 1,
storage_key VARCHAR(512) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE file_chunk_map (
file_id UUID REFERENCES files(file_id),
chunk_id VARCHAR(64) REFERENCES file_chunks(chunk_id),
chunk_index INTEGER NOT NULL,
PRIMARY KEY (file_id, chunk_index)
);
Verify: SELECT count(*) FROM information_schema.tables WHERE table_name IN ('users','files','file_chunks','file_chunk_map'); → expected: 4
Split files into chunks on the client side, compute SHA-256 hashes, and upload only new chunks. This enables deduplication and resumable uploads. [src1]
# chunk_upload.py -- Client-side chunked upload with dedup
import hashlib
import requests
CHUNK_SIZE = 4 * 1024 * 1024 # 4MB chunks
API_BASE = "https://api.filestorage.example.com"
def upload_file(filepath: str, token: str) -> dict:
"""Upload a file in chunks with deduplication."""
chunks = []
headers = {"Authorization": f"Bearer {token}"}
with open(filepath, "rb") as f:
index = 0
while True:
data = f.read(CHUNK_SIZE)
if not data:
break
chunk_hash = hashlib.sha256(data).hexdigest()
chunks.append({"index": index, "hash": chunk_hash, "size": len(data)})
# Check if chunk already exists (dedup)
check = requests.head(
f"{API_BASE}/chunks/{chunk_hash}",
headers=headers
)
if check.status_code == 404:
# Get presigned upload URL
resp = requests.post(
f"{API_BASE}/chunks/upload-url",
json={"chunk_hash": chunk_hash, "size": len(data)},
headers=headers
)
presigned_url = resp.json()["upload_url"]
# Upload directly to blob store
requests.put(presigned_url, data=data)
index += 1
# Register the file with its chunk manifest
file_hash = hashlib.sha256(
"".join(c["hash"] for c in chunks).encode()
).hexdigest()
resp = requests.post(
f"{API_BASE}/files",
json={
"name": filepath.split("/")[-1],
"checksum": file_hash,
"chunks": chunks,
},
headers=headers,
)
return resp.json()
Verify: Upload a 20MB file → should produce 5 chunks of 4MB each, server returns 201 with file_id
Use presigned URLs so clients upload/download directly to/from blob storage, bypassing your servers. This is critical for scalability. [src4] [src5]
// presigned-urls.js -- Generate S3 presigned URLs (AWS SDK v3)
import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
const s3 = new S3Client({ region: "us-east-1" });
const BUCKET = "user-files-prod";
export async function getUploadUrl(chunkHash, sizeBytes) {
const command = new PutObjectCommand({
Bucket: BUCKET,
Key: `chunks/${chunkHash}`,
ContentLength: sizeBytes,
ServerSideEncryption: "aws:kms",
});
return getSignedUrl(s3, command, { expiresIn: 600 }); // 10 min
}
export async function getDownloadUrl(fileKey, filename) {
const command = new GetObjectCommand({
Bucket: BUCKET,
Key: fileKey,
ResponseContentDisposition: `attachment; filename="${filename}"`,
});
return getSignedUrl(s3, command, { expiresIn: 3600 }); // 60 min
}
Verify: Call getUploadUrl("abc123", 4194304) → returns URL starting with https://user-files-prod.s3.us-east-1.amazonaws.com/...
For Dropbox-like sync, implement a notification system where clients maintain a cursor (version vector) and receive push updates on file changes. [src3]
Client Sync Protocol:
1. Client connects via WebSocket to /sync endpoint
2. Client sends its last known cursor (monotonic version number)
3. Server streams all changes since that cursor:
{action: "update", file_id: "...", version: 42, chunks: [...]}
{action: "delete", file_id: "...", version: 43}
4. Client applies changes locally, updates cursor
5. Server pushes new changes in real-time as they occur
6. On conflict: last-writer-wins + save conflicted copy
Verify: Two clients connected → modify file on Client A → Client B receives change event within 2 seconds
Choose between replication (simpler, faster reads) and erasure coding (cheaper storage, higher durability) based on your scale and budget. [src6]
Durability Comparison:
3x Replication Reed-Solomon (10,4)
Storage overhead: 200% 40%
Read latency: Lowest Slightly higher (decode)
Write latency: Low Moderate (encode)
Repair bandwidth: Copy 1 replica Read 10 chunks to reconstruct
Durability (3 AZs): ~99.9999% ~99.999999999%
Min nodes to survive: 1 of 3 10 of 14
Best for: Hot data Warm/cold data
Verify: Calculate: with 1PB raw data, 3x replication = 3PB total; erasure coding (10,4) = 1.4PB total → 53% storage savings
For systems that prioritize storage efficiency, use content-defined chunking (CDC) instead of fixed-size chunks. CDC produces stable chunk boundaries even when bytes are inserted or deleted. [src7]
# cdc_chunking.py -- Content-Defined Chunking using Rabin fingerprint
import hashlib
WINDOW_SIZE = 48
MIN_CHUNK = 2 * 1024 * 1024 # 2MB minimum
MAX_CHUNK = 8 * 1024 * 1024 # 8MB maximum
TARGET_CHUNK = 4 * 1024 * 1024 # 4MB target
MASK = (1 << 22) - 1 # ~4MB average with 22-bit mask
def chunk_file_cdc(filepath: str) -> list[dict]:
"""Split file using content-defined chunking."""
chunks = []
with open(filepath, "rb") as f:
buf = bytearray()
fingerprint = 0
while True:
byte = f.read(1)
if not byte:
if buf:
h = hashlib.sha256(bytes(buf)).hexdigest()
chunks.append({"hash": h, "size": len(buf)})
break
buf.append(byte[0])
if len(buf) < MIN_CHUNK:
continue
fingerprint = ((fingerprint << 1) + byte[0]) & 0xFFFFFFFF
if (fingerprint & MASK) == 0 or len(buf) >= MAX_CHUNK:
h = hashlib.sha256(bytes(buf)).hexdigest()
chunks.append({"hash": h, "size": len(buf)})
buf = bytearray()
fingerprint = 0
return chunks
Verify: Chunk a 50MB file → expect 10-15 chunks; modify 1KB in the middle → only 1-2 chunks change
# Input: filepath (str), api_base (str), token (str)
# Output: dict with file_id and upload status
import hashlib
import requests
from requests.adapters import HTTPAdapter, Retry
CHUNK_SIZE = 4 * 1024 * 1024 # 4MB
session = requests.Session()
session.mount("https://", HTTPAdapter(
max_retries=Retry(total=3, backoff_factor=0.5,
status_forcelist=[500, 502, 503])
))
def upload_chunked(filepath: str, api_base: str, token: str) -> dict:
headers = {"Authorization": f"Bearer {token}"}
chunks = []
with open(filepath, "rb") as f:
idx = 0
while data := f.read(CHUNK_SIZE):
chunk_hash = hashlib.sha256(data).hexdigest()
if session.head(f"{api_base}/chunks/{chunk_hash}",
headers=headers).status_code == 404:
url_resp = session.post(f"{api_base}/chunks/upload-url",
json={"hash": chunk_hash, "size": len(data)},
headers=headers)
session.put(url_resp.json()["url"], data=data,
headers={"Content-Type": "application/octet-stream"})
chunks.append({"index": idx, "hash": chunk_hash, "size": len(data)})
idx += 1
return session.post(f"{api_base}/files",
json={"name": filepath.rsplit("/", 1)[-1], "chunks": chunks},
headers=headers).json()
// Input: objectKey (string), bucket (string)
// Output: { uploadUrl, downloadUrl } with presigned URLs
import { S3Client, PutObjectCommand, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
const s3 = new S3Client({ region: process.env.AWS_REGION || "us-east-1" });
export async function generatePresignedUrls(objectKey, bucket) {
const uploadUrl = await getSignedUrl(s3,
new PutObjectCommand({ Bucket: bucket, Key: objectKey,
ServerSideEncryption: "aws:kms" }),
{ expiresIn: 600 }
);
const downloadUrl = await getSignedUrl(s3,
new GetObjectCommand({ Bucket: bucket, Key: objectKey }),
{ expiresIn: 3600 }
);
return { uploadUrl, downloadUrl };
}
# BAD -- File content passes through your server, creating a bottleneck
@app.route("/upload", methods=["POST"])
def upload():
file_data = request.files["file"].read() # Entire file in server RAM
s3.put_object(Bucket="files", Key=key, Body=file_data)
return {"status": "ok"}
# GOOD -- Server only generates URL; client uploads directly to S3
@app.route("/upload-url", methods=["POST"])
def get_upload_url():
key = f"uploads/{uuid4()}/{request.json['filename']}"
url = s3.generate_presigned_url("put_object",
Params={"Bucket": "files", "Key": key}, ExpiresIn=600)
return {"upload_url": url, "key": key}
# BAD -- Databases are not designed for large binary storage
cursor.execute(
"INSERT INTO files (name, content) VALUES (%s, %s)",
(filename, file_bytes) # Blows up DB size, kills query performance
)
# GOOD -- Separate concerns: metadata in DB, blobs in S3
s3.upload_fileobj(file_obj, "files-bucket", object_key)
cursor.execute(
"INSERT INTO files (name, storage_key, size) VALUES (%s, %s, %s)",
(filename, object_key, file_size)
)
# BAD -- Re-uploading 500MB file for a 1KB change
def sync_file(filepath):
with open(filepath, "rb") as f:
s3.upload_fileobj(f, "bucket", filepath) # Full re-upload every time
# GOOD -- Only upload chunks whose hashes changed
def sync_file(filepath, previous_chunks):
current_chunks = chunk_file(filepath)
for chunk in current_chunks:
if chunk["hash"] not in {c["hash"] for c in previous_chunks}:
upload_chunk(chunk) # Only new/changed chunks
update_manifest(filepath, current_chunks)
# BAD -- One region failure = total data loss
s3.create_bucket(Bucket="files", CreateBucketConfiguration={
"LocationConstraint": "us-east-1"
})
# No cross-region replication, no versioning
# GOOD -- Enable versioning + cross-region replication
s3.put_bucket_versioning(Bucket="files",
VersioningConfiguration={"Status": "Enabled"})
s3.put_bucket_replication(Bucket="files", ReplicationConfiguration={
"Role": "arn:aws:iam::role/replication",
"Rules": [{"Status": "Enabled", "Destination": {
"Bucket": "arn:aws:s3:::files-replica-eu"}}]
})
user_id or org_id; use read replicas for listing operations. [src1]ref_count=0. [src3]# Check S3 bucket replication status
aws s3api get-bucket-replication --bucket my-files-bucket
# Verify multipart uploads in progress (orphaned uploads waste storage)
aws s3api list-multipart-uploads --bucket my-files-bucket
# Check MinIO cluster health (self-hosted)
mc admin info myminio
# Verify data integrity with checksums
aws s3api head-object --bucket my-files-bucket --key chunks/abc123 --checksum-mode ENABLED
# Monitor storage usage per prefix
aws s3api list-objects-v2 --bucket my-files-bucket --prefix chunks/ \
--query "sum(Contents[].Size)" --output text
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Storing user-uploaded files (documents, images, videos) | Storing structured/relational data needing queries | Relational database (PostgreSQL, MySQL) |
| Need >99.999% durability for customer data | Need sub-millisecond latency for small key-value lookups | Redis, DynamoDB, or in-memory cache |
| Multi-device file sync (Dropbox-like) | Simple static asset serving (JS, CSS, images) | CDN with origin bucket |
| Files range from KB to multi-GB | All data is <1KB (metadata-sized) | Key-value store (DynamoDB, Redis) |
| Need deduplication across users/files | Write-once archival with no dedup needs | S3 Glacier / cold storage directly |
| Compliance requires versioning and audit trails | Ephemeral/temporary data (cache, sessions) | Redis with TTL or temporary S3 with lifecycle rules |