Ride-Sharing Platform System Design (Uber Clone)
How do I design a ride-sharing platform (Uber clone)?
TL;DR
- Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).
- Key tool/command:
h3-jslibrary for hexagonal geospatial indexing — converts lat/lng to cell IDs for O(1) nearest-driver lookups. - Watch out for: Storing driver locations in a relational DB — at scale, you need an in-memory geospatial index (Redis GEO or custom H3 grid) updated every 3-4 seconds per driver.
- Works with: Any cloud provider (AWS/GCP/Azure), any language stack; core patterns are language-agnostic.
Constraints
- GPS accuracy is 3-10m in urban canyons — always map-snap coordinates to the road network before matching or ETA calculation
- Driver location updates at scale (>10K concurrent drivers) must go through a message queue (Kafka/Pulsar), not direct DB writes
- Surge pricing multipliers are regulated in multiple jurisdictions (NYC caps at 2x during emergencies, EU has transparency requirements) — always implement configurable caps
- Payment charges must be idempotent — network retries must not double-charge riders; use idempotency keys
- WebSocket-based real-time connections require sticky sessions or a connection registry service — stateless horizontal scaling will silently drop connections
Quick Reference
| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway | Rate limiting, auth, request routing | Kong, AWS API Gateway, Envoy | Horizontal + edge caching |
| Rider Service | Ride requests, fare estimates, trip history | Node.js, Go, Java Spring | Stateless horizontal pods |
| Driver Service | Driver onboarding, availability, earnings | Go, Java, Kotlin | Stateless horizontal pods |
| Location Service | Ingest GPS pings, maintain driver positions | Go + Redis GEO, custom H3 grid | Sharded by geo-region |
| Matching Service | Pair riders with nearest available drivers | Go, Rust, Java | Sharded by H3 cell region |
| Trip Service | Trip state machine (requested → matched → in_progress → completed) | Node.js, Go | Event-sourced with Kafka |
| Pricing Service | Fare calculation, surge pricing, promotions | Python, Go | Stateless; reads supply/demand from cache |
| Payment Service | Charge riders, pay drivers, handle refunds | Java, Node.js + Stripe/Braintree | Async processing with idempotency |
| Notification Service | Push notifications, SMS, email | Node.js, Go + Firebase/APNs/SNS | Fan-out via message queue |
| ETA Service | Estimated time of arrival, route optimization | Python, C++ + OSRM/Valhalla | Precomputed graph + ML model |
| WebSocket Gateway | Persistent connections for real-time updates | Node.js (Socket.io), Go (gorilla/websocket) | Sticky sessions + connection registry |
| Message Queue | Async event bus for all services | Apache Kafka, Apache Pulsar | Partitioned by driver_id or trip_id |
| Geospatial Index | Fast nearest-neighbor driver lookup | Redis GEO, H3 in-memory grid, PostGIS | Sharded by H3 resolution-3 cells |
| Analytics Pipeline | Trip data, driver metrics, business intelligence | Kafka → Spark/Flink → data warehouse | Batch + real-time (Lambda architecture) |
Decision Tree
START
├── Expected scale?
│ ├── <1K concurrent users (MVP)
│ │ ├── Use monolith with PostGIS for location queries
│ │ ├── Simple distance-based matching (SQL query)
│ │ └── Fixed pricing — skip surge entirely
│ ├── 1K-100K concurrent users (city-level)
│ │ ├── Split into 5-8 microservices
│ │ ├── Redis GEO for driver locations
│ │ ├── Kafka for event streaming
│ │ └── Basic supply/demand surge pricing
│ ├── 100K-1M concurrent users (regional)
│ │ ├── Full microservices (12+ services)
│ │ ├── H3 hexagonal index, sharded by region
│ │ ├── Dedicated matching service with scoring algorithm
│ │ └── ML-based surge pricing + ETA prediction
│ └── >1M concurrent users (global, Uber-scale)
│ ├── Geo-sharded infrastructure (multi-region)
│ ├── Custom geospatial engine (RingPop for consistent hashing)
│ ├── Real-time ML pipeline for matching/pricing/ETA
│ └── CQRS + event sourcing for trip state
├── Matching priority?
│ ├── Lowest wait time → Nearest-driver with availability check
│ ├── Cost optimization → Factor in driver heading direction + ETA
│ └── Quality → Weighted scoring: distance (40%) + rating (30%) + acceptance rate (30%)
└── Pricing model?
├── Fixed fare → Precomputed zone-to-zone matrix
├── Metered → Distance (GPS trace) + time (wall clock) + base fare
└── Dynamic surge → Supply/demand ratio per H3 cell, updated every 30-60s
Step-by-Step Guide
1. Define the data model and core entities
Design your database schema around four core entities: Users (riders + drivers), Vehicles, Trips, and Payments. Use PostgreSQL for transactional data and Redis for real-time state. [src2]
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
role VARCHAR(10) NOT NULL CHECK (role IN ('rider', 'driver')),
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20) UNIQUE NOT NULL,
rating DECIMAL(3,2) DEFAULT 5.00,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE trips (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
rider_id UUID REFERENCES users(id),
driver_id UUID REFERENCES users(id),
status VARCHAR(20) NOT NULL DEFAULT 'requested',
pickup_lat DOUBLE PRECISION NOT NULL,
pickup_lng DOUBLE PRECISION NOT NULL,
dropoff_lat DOUBLE PRECISION NOT NULL,
dropoff_lng DOUBLE PRECISION NOT NULL,
fare_cents INT,
surge_multiplier DECIMAL(3,2) DEFAULT 1.00,
requested_at TIMESTAMPTZ DEFAULT NOW(),
completed_at TIMESTAMPTZ
);
Verify: SELECT count(*) FROM information_schema.tables WHERE table_name IN ('users', 'trips'); → expected: 2
2. Implement the location ingestion pipeline
Drivers send GPS pings every 3-4 seconds over WebSocket. These flow through Kafka to the Location Service, which updates an in-memory geospatial index for real-time matching. [src3]
Driver App --[WebSocket]--> WebSocket Gateway
--[produce]--> Kafka (topic: driver.location)
--[consume]--> Location Service
├── Hot path: Update Redis GEO / H3 in-memory index
└── Cold path: Write to TimescaleDB for trip trace history
Verify: Send a test location ping and confirm Redis GEOSEARCH returns the driver within the expected radius.
3. Build the geospatial matching engine
Use Uber's H3 hexagonal grid to partition the map. Convert the pickup location to an H3 cell ID, then search that cell and its k-ring neighbors for available drivers. [src1]
import h3
def find_nearby_drivers(pickup_lat, pickup_lng, driver_index, k=1):
pickup_cell = h3.latlng_to_cell(pickup_lat, pickup_lng, 9)
search_cells = h3.grid_disk(pickup_cell, k)
nearby_drivers = []
for cell in search_cells:
if cell in driver_index:
nearby_drivers.extend(driver_index[cell])
return nearby_drivers
Verify: h3.latlng_to_cell(40.7128, -74.0060, 9) returns a valid 15-character hex string.
4. Implement the ride matching algorithm
Score candidate drivers using a weighted function of distance, ETA, rating, and acceptance rate. Send the offer to the top-ranked driver; cascade on decline/timeout. [src2]
def score_driver(driver, pickup_lat, pickup_lng):
dist = haversine(pickup_lat, pickup_lng, driver['lat'], driver['lng'])
distance_score = max(0, (5.0 - dist) / 5.0) * 40
rating_score = (driver['rating'] / 5.0) * 30
accept_score = driver['acceptance_rate'] * 20
heading_bonus = 10 if driver.get('heading_toward') else 0
return distance_score + rating_score + accept_score + heading_bonus
Verify: A driver 0.1km away with 4.8 rating and 92% acceptance scores > 80.
5. Build the surge pricing engine
Calculate surge multipliers per H3 cell by comparing demand to supply. Update every 30-60 seconds. [src6]
def calculate_surge(cell_id, request_count, available_drivers,
base_threshold=0.7, max_multiplier=3.0):
if available_drivers == 0:
return max_multiplier
ratio = request_count / available_drivers
if ratio <= base_threshold:
return 1.0
surge = 1.0 + (ratio - base_threshold) * 1.5
return min(round(surge, 2), max_multiplier)
Verify: calculate_surge("cell", 10, 20) → 1.0; calculate_surge("cell", 50, 10) → 3.0
6. Implement the trip state machine
Model each trip as a finite state machine with event sourcing via Kafka. [src4]
REQUESTED ──[driver accepts]──> MATCHED
MATCHED ──[driver arrives]──> DRIVER_ARRIVING
DRIVER_ARRIVING ──[rider picked up]──> IN_PROGRESS
IN_PROGRESS ──[arrived at destination]──> COMPLETED
COMPLETED ──[payment processed]──> PAID
Verify: No valid transition exists from COMPLETED back to IN_PROGRESS.
7. Set up the payment and billing pipeline
Process payments asynchronously after trip completion. Use idempotency keys to prevent double charges. [src5]
Trip Completed Event (Kafka) --> Payment Service
1. Calculate final fare: base + (distance_km * per_km) + (duration_min * per_min) * surge
2. Capture pre-authorized amount (Stripe/Braintree)
3. If capture fails: retry with exponential backoff (max 3)
4. Emit PaymentCompleted event
Verify: Two identical capture requests with same idempotency key → only one charge.
Code Examples
Python: Geospatial Driver Matching with H3
# Input: rider pickup coordinates, dict of active drivers
# Output: ranked list of nearby driver IDs
import h3
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
def update_driver_location(driver_id: str, lat: float, lng: float):
r.geoadd("drivers:geo", (lng, lat, driver_id))
cell = h3.latlng_to_cell(lat, lng, 9)
r.sadd(f"drivers:h3:{cell}", driver_id)
r.set(f"drivers:cell:{driver_id}", cell, ex=30)
def match_rider(pickup_lat: float, pickup_lng: float, radius_km: float = 3.0):
results = r.geosearch(
"drivers:geo", longitude=pickup_lng, latitude=pickup_lat,
radius=radius_km, unit="km", sort="ASC", count=10
)
return [driver_id.decode() for driver_id in results]
JavaScript/Node.js: WebSocket Driver Connection Handler
// Input: WebSocket connection from driver app
// Output: location updates forwarded to Kafka
const { Kafka } = require('kafkajs');
const WebSocket = require('ws');
const kafka = new Kafka({ brokers: ['kafka:9092'] });
const producer = kafka.producer();
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', (ws, req) => {
const driverId = req.headers['x-driver-id'];
ws.on('message', async (data) => {
const { lat, lng, timestamp } = JSON.parse(data);
await producer.send({
topic: 'driver.location',
messages: [{ key: driverId, value: JSON.stringify({ lat, lng, timestamp }) }]
});
});
});
Go: Surge Pricing Calculator
// Input: demand (request count), supply (driver count)
// Output: surge multiplier (float64)
package pricing
import "math"
const (
BaseThreshold = 0.7
MaxMultiplier = 3.0
SurgeSlope = 1.5
)
func CalculateSurge(requests, drivers int) float64 {
if drivers == 0 {
return MaxMultiplier
}
ratio := float64(requests) / float64(drivers)
if ratio <= BaseThreshold {
return 1.0
}
surge := 1.0 + (ratio-BaseThreshold)*SurgeSlope
return math.Min(math.Round(surge*100)/100, MaxMultiplier)
}
Anti-Patterns
Wrong: Polling all drivers in the database for every ride request
-- BAD: full table scan on every ride request, O(n) for n drivers
SELECT id, lat, lng,
ST_Distance(location, ST_MakePoint(-74.006, 40.7128)) AS dist
FROM drivers
WHERE is_available = true
ORDER BY dist ASC
LIMIT 10;
-- At 100K+ active drivers this query takes 500ms+ per request
Correct: Use in-memory geospatial index with H3 partitioning
# GOOD: O(1) cell lookup + O(k) for k neighbors, typically <5ms
pickup_cell = h3.latlng_to_cell(40.7128, -74.006, 9)
search_cells = h3.grid_disk(pickup_cell, 1) # ~7 cells
candidates = []
for cell in search_cells:
candidates.extend(driver_index.get(cell, []))
Wrong: Synchronous payment processing blocking trip completion
# BAD: rider waits for payment before seeing "trip complete"
def complete_trip(trip_id):
trip = db.get_trip(trip_id)
trip.status = 'completed'
charge = stripe.charges.create(amount=trip.fare) # 2-5s blocking call
if charge.status != 'succeeded':
trip.status = 'payment_failed' # rider stuck
db.save(trip)
Correct: Async payment via event queue with retry
# GOOD: trip completes instantly, payment processed asynchronously
def complete_trip(trip_id):
trip = db.get_trip(trip_id)
trip.status = 'completed'
db.save(trip)
kafka.produce('trip.completed', {
'trip_id': trip_id, 'fare': trip.fare,
'idempotency_key': f"trip-{trip_id}"
})
Wrong: Single-point surge pricing for an entire city
# BAD: one surge multiplier for the whole city
total_requests = get_city_wide_requests()
total_drivers = get_city_wide_drivers()
city_surge = total_requests / total_drivers # meaningless average
Correct: Per-cell surge pricing using H3 hexagonal grid
# GOOD: granular surge per H3 cell captures local imbalance
for cell_id in active_cells:
requests = get_cell_requests(cell_id, window_minutes=5)
drivers = get_cell_drivers(cell_id)
surge = calculate_surge(cell_id, requests, drivers)
cache.set(f"surge:{cell_id}", surge, ttl=60)
Common Pitfalls
- Stale driver locations: Drivers that close the app remain in the index as "available." Fix: Set a TTL on driver location entries (30s) and require periodic heartbeats. [src3]
- GPS drift in tunnels/garages: GPS signal loss causes drivers to "teleport" when signal returns, triggering false surge spikes. Fix: Apply a Kalman filter or speed-based sanity check — discard jumps > 200km/h. [src1]
- Hot-spot matching storms: A concert ending creates 10K requests in one H3 cell. Fix: Expand the search radius dynamically and implement a request queue with ETA-based prioritization. [src4]
- Payment authorization expiry: Pre-authorized holds expire after 7 days. Fix: Capture within 24 hours; for disputes, create a new charge. [src5]
- Cascading driver rejection: Re-running full matching on each decline wastes time. Fix: Pre-rank the top 5 candidates and cascade through with 10-15s timeouts. [src2]
- Inconsistent trip state: Rider app shows "matched" but driver shows "cancelled." Fix: Use a single source of truth (Trip Service with event sourcing). [src5]
- ETA inaccuracy from Euclidean distance: Straight-line distance / avg speed ignores road networks. Fix: Use a road-graph routing engine (OSRM, Valhalla). [src4]
- WebSocket connection leaks: Failed connections exhaust server file descriptors. Fix: Implement ping/pong health checks every 30s and automatic cleanup. [src5]
Diagnostic Commands
# Check Redis GEO driver count in radius
redis-cli GEOSEARCH drivers:geo FROMLONLAT -74.006 40.7128 BYRADIUS 3 km COUNT 100 ASC
# Monitor Kafka consumer lag for location topic
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group location-consumer
# Count active WebSocket connections
ss -s | grep -i estab
# Check H3 cell for a coordinate
python3 -c "import h3; print(h3.latlng_to_cell(40.7128, -74.006, 9))"
# Monitor trip state transitions (Kafka)
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic trip.events --from-latest
# Check surge pricing for a cell
redis-cli GET surge:891f1a80537ffff
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a two-sided marketplace connecting riders with drivers in real-time | Building a food delivery platform with restaurant prep time | Food delivery system design (different matching + batching) |
| Need to handle >1K concurrent ride requests with <3s matching latency | Simple point-to-point scheduled shuttle service | Queue-based booking system |
| Implementing dynamic pricing based on real-time supply/demand | Fixed-route public transit scheduling | Transit scheduling system (GTFS-based) |
| Require real-time driver tracking and ETA updates for riders | Package/freight logistics with multi-day delivery windows | Logistics/fleet management system design |
| Supporting multiple vehicle types (economy, premium, XL) in one platform | Peer-to-peer car rental (no real-time matching needed) | Marketplace platform design (Airbnb-style) |
Important Caveats
- The H3 resolution level matters significantly: resolution 9 (~174m edge) is suitable for urban areas, but rural matching may need resolution 7 (~5.2km edge) to find enough drivers
- Surge pricing algorithms are under active regulatory scrutiny — California AB5, EU Digital Services Act, and NYC TLC all impose constraints; always build in jurisdiction-specific caps and transparency mechanisms
- Driver matching at Uber-scale (>1M concurrent) uses ML ranking models trained on acceptance/completion data — the weighted scoring shown here is a solid starting point but production systems evolve toward learned models
- WebSocket at scale is expensive — at >100K concurrent drivers, consider a connection broker (e.g., RingPop, NATS) rather than a monolithic WebSocket server
- Payment processing varies dramatically by country — design the payment service with pluggable provider adapters from day one