LPUSH user:{id}:timeline {tweet_id} (Redis list per user for O(1) feed reads)| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway | Rate limiting, auth, routing | Nginx, Kong, AWS API Gateway | Horizontal scaling behind LB |
| Post Service | Create/store posts and metadata | Python/Go + PostgreSQL | Shard by user_id, master-slave replication |
| Fan-Out Service | Distribute posts to follower timelines | Go/Java workers + Kafka consumers | Partition Kafka by user_id, scale consumers horizontally |
| Timeline Service | Serve precomputed home feeds | Node.js/Go + Redis | Redis Cluster with consistent hashing |
| Social Graph Service | Manage follow/unfollow relationships | Go/Java + Graph DB or Redis | Adjacency list in Redis, Neo4j for deep queries |
| User Service | Profiles, auth, preferences | Any framework + PostgreSQL | Read replicas, cache hot profiles |
| Media Service | Upload, process, serve images/video | Python/Go + S3/GCS + CDN | Pre-signed uploads, CDN edge caching |
| Search Service | Full-text search on posts | Elasticsearch, Apache Solr | Sharded index, scatter-gather queries |
| Notification Service | Push/email/in-app alerts on engagement | Go/Python + Kafka + FCM/APNs | Async via message queue, batch delivery |
| Ranking Service | ML-based feed personalization | Python + TensorFlow/PyTorch | Feature store + model serving (TF Serving, Triton) |
| CDN | Static asset and media delivery | CloudFront, Cloudflare, Fastly | Edge PoPs, cache invalidation via purge API |
| Message Queue | Async communication between services | Apache Kafka, RabbitMQ, SQS | Partitioned topics, consumer groups |
| Cache Layer | Reduce DB load for hot data | Redis, Memcached | Redis Cluster, LRU eviction, ~800 items/timeline |
| Object Storage | Persistent media file storage | AWS S3, Google Cloud Storage | Lifecycle policies, cross-region replication |
START
|-- Expected DAU?
| |-- <10K users
| | --> Simple pull-based feed: query DB at read time, paginate with cursor
| |-- 10K - 1M users
| | --> Fan-out-on-write for all users
| | --> Precompute timelines into Redis on every post
| |-- 1M - 100M users
| | |-- Any user with >500K followers?
| | | |-- YES --> Hybrid: fan-out-on-write for normal, fan-out-on-read for celebrities
| | | |-- NO --> Pure fan-out-on-write with Redis Cluster
| | |-- Feed type?
| | | |-- Chronological --> Sort by timestamp in Redis sorted set
| | | |-- Ranked --> Add Ranking Service with ML model
| | | |-- Hybrid --> Chronological base + lightweight re-ranking
| |-- >100M users
| --> Full hybrid fan-out + ML ranking pipeline
| --> Multi-region deployment with geo-sharding
| --> Edge caching for feed responses
Design the core entities: Users, Posts, Follows, Likes, Comments. Define REST or gRPC endpoints for create-post, get-timeline, follow/unfollow. [src1]
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(64) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE posts (
post_id BIGINT PRIMARY KEY,
author_id BIGINT NOT NULL REFERENCES users(user_id),
content TEXT NOT NULL,
media_url TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
like_count INT DEFAULT 0
);
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (follower_id, followee_id)
);
Verify: SELECT COUNT(*) FROM information_schema.tables WHERE table_name IN ('users','posts','follows'); --> expected: 3
When a user creates a post, write it to the Posts table, then publish an event to Kafka for async fan-out. [src2]
producer = KafkaProducer(bootstrap_servers=["kafka:9092"])
def create_post(author_id, content, media_url=None):
post = db.insert_post(author_id, content, media_url)
producer.send("post-created",
key=str(author_id).encode(),
value=json.dumps({"post_id": post["post_id"], "author_id": author_id}).encode()
)
return post
Verify: kafka-console-consumer --topic post-created --from-beginning --> expected: JSON with post_id
Kafka consumers read post-created events, look up followers, and push post_id into each follower's Redis timeline. Skip fan-out for celebrities. [src3]
CELEBRITY_THRESHOLD = 500_000
for message in consumer:
event = message.value
followers = get_followers(event["author_id"])
if len(followers) > CELEBRITY_THRESHOLD:
continue # handled at read time
pipe = redis.pipeline(transaction=False)
for fid in followers:
pipe.lpush(f"user:{fid}:timeline", event["post_id"])
pipe.ltrim(f"user:{fid}:timeline", 0, 799)
pipe.execute()
Verify: redis-cli LRANGE user:42:timeline 0 9 --> expected: list with new post_id
Read precomputed Redis list, hydrate post objects, merge celebrity tweets at serve time. [src2] [src3]
def get_home_timeline(user_id, cursor=0, limit=20):
post_ids = redis.lrange(f"user:{user_id}:timeline", cursor, cursor + limit - 1)
for celeb_id in get_followed_celebrities(user_id):
celeb_posts = db.query_recent_posts(celeb_id, limit)
post_ids.extend(celeb_posts)
posts = hydrate_posts(list(dict.fromkeys(post_ids)))
posts.sort(key=lambda p: p["created_at"], reverse=True)
return posts[:limit]
Verify: curl /api/v1/timeline?user_id=42&limit=10 --> expected: JSON array of posts
Insert a ranking layer scoring posts by engagement probability using a lightweight ML model. [src4] [src5]
def rank_feed(user_id, candidates):
features = extract_features(user_id, candidates)
scores = ranking_model.predict(features)
for post, score in zip(candidates, scores):
post["rank_score"] = float(score)
candidates.sort(key=lambda p: p["rank_score"], reverse=True)
return candidates
Verify: Compare ranked vs chronological output -- ranked feed should prioritize higher-engagement posts
Set up pre-signed uploads to S3, configure CDN for delivery, cache feed API responses at the edge. [src1] [src6]
def generate_upload_url(user_id, filename):
key = f"media/{user_id}/{filename}"
return s3.generate_presigned_url("put_object",
Params={"Bucket": "feed-media", "Key": key},
ExpiresIn=3600
)
Verify: Upload test image via pre-signed URL, fetch via CDN --> expected: 200 OK
# Input: user_id (int), cursor (int), limit (int)
# Output: list of post dicts sorted by rank score
import redis
r = redis.Redis(host="redis-cluster", port=6379, decode_responses=True)
def generate_feed(user_id, cursor=0, limit=20):
"""Hybrid feed: precomputed timeline + celebrity merge + ranking."""
tl_key = f"user:{user_id}:timeline"
post_ids = r.lrange(tl_key, cursor, cursor + limit * 2)
celeb_followees = get_followed_celebrities(user_id)
for cid in celeb_followees:
celeb_posts = r.lrange(f"user:{cid}:posts", 0, limit - 1)
post_ids.extend(celeb_posts)
post_ids = list(dict.fromkeys(post_ids))
posts = batch_hydrate(post_ids)
ranked = rank_feed(user_id, posts)
return ranked[cursor:cursor + limit]
# Input: author_id (int), post_id (int), follower_ids (list[int])
# Output: None (updates Redis timelines)
import redis
r = redis.Redis(host="redis-cluster", port=6379, decode_responses=True)
MAX_TIMELINE_SIZE = 800
TIMELINE_TTL = 7 * 24 * 3600 # 7 days
def fan_out_to_timelines(author_id, post_id, follower_ids):
pipe = r.pipeline(transaction=False)
for fid in follower_ids:
key = f"user:{fid}:timeline"
pipe.lpush(key, post_id)
pipe.ltrim(key, 0, MAX_TIMELINE_SIZE - 1)
pipe.expire(key, TIMELINE_TTL)
pipe.execute()
# BAD -- celebrity with 10M followers causes 10M Redis writes per tweet
def fan_out(author_id, post_id):
followers = get_all_followers(author_id) # could be 10M+
for fid in followers:
redis.lpush(f"user:{fid}:timeline", post_id)
# Result: single tweet takes minutes, OOM risk
# GOOD -- skip fan-out for high-follower accounts, merge at read time
CELEBRITY_THRESHOLD = 500_000
def fan_out(author_id, post_id):
followers = get_all_followers(author_id)
if len(followers) > CELEBRITY_THRESHOLD:
return # handled at read time
pipe = redis.pipeline(transaction=False)
for fid in followers:
pipe.lpush(f"user:{fid}:timeline", post_id)
pipe.ltrim(f"user:{fid}:timeline", 0, 799)
pipe.execute()
# BAD -- blobs in PostgreSQL cause table bloat and slow queries
def save_post(content, image_bytes):
db.execute("INSERT INTO posts (content, image_data) VALUES (%s, %s)",
(content, image_bytes)) # 5MB image in a row!
# GOOD -- database stores only the URL reference
def save_post(content, image_file):
key = f"media/{uuid4()}.jpg"
s3.upload_fileobj(image_file, "feed-media", key)
cdn_url = f"https://cdn.example.com/{key}"
db.execute("INSERT INTO posts (content, media_url) VALUES (%s, %s)",
(content, cdn_url))
# BAD -- Redis memory grows without bound
def add_to_timeline(user_id, post_id):
redis.lpush(f"user:{user_id}:timeline", post_id)
# No LTRIM, no TTL -- grows forever, exhausts Redis memory
# GOOD -- bounded memory usage per user
def add_to_timeline(user_id, post_id):
key = f"user:{user_id}:timeline"
pipe = redis.pipeline()
pipe.lpush(key, post_id)
pipe.ltrim(key, 0, 799)
pipe.expire(key, 7 * 24 * 3600)
pipe.execute()
# BAD -- user waits for fan-out to complete
@app.post("/api/posts")
def create_post(request):
post = db.insert_post(request.data)
for fid in get_followers(post.author_id): # blocks for seconds
redis.lpush(f"user:{fid}:timeline", post.id)
return {"status": "ok"} # user waited 5+ seconds
# GOOD -- return immediately, fan-out asynchronously
@app.post("/api/posts")
def create_post(request):
post = db.insert_post(request.data)
kafka.send("post-created", {"post_id": post.id, "author_id": post.author_id})
return {"status": "ok"} # instant response
use dedicated topic or random partitioning for high-follower accounts. [src3]rate-limit rebuilds and use a circuit breaker. [src2]query primary for celebrity posts or cache with short TTL. [src3]use cursor-based pagination (last_seen_post_id) instead of offset. [src1]use Redis INCR for counters with periodic DB reconciliation. [src6]cache warming on deploy + single-flight cache population. [src7]use projection queries or field selection in hydration. [src1]| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a follow-graph-based feed (Twitter, Instagram, LinkedIn) | Content is purely algorithmic with no follow graph (TikTok For You) | Recommendation engine with collaborative filtering |
| DAU > 10K and feed latency matters (<200ms) | Small team project or MVP with <1K users | Simple SQL query with pagination |
| Posts are short-lived and time-sensitive (news, social updates) | Content is long-form and rarely updated (blog, wiki) | CMS with traditional caching |
| Real-time or near-real-time delivery is required | Batch-processed feeds updated hourly are acceptable | Batch ETL pipeline with scheduled materialization |
| Multiple content types (text, images, video, polls) | Single content type with simple ordering | Standard list API with sorting |