Design a ride-sharing service like Uber

- Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).

Uber system design interview

- Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).

Ride-hailing platform architecture

- Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).

How to build an Uber clone

- Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).

Ride-Sharing Platform System Design (Uber Clone)

How do I design a ride-sharing platform (Uber clone)?

TL;DR

Bottom line: A ride-sharing platform requires five core subsystems: real-time location tracking (WebSocket + Kafka), geospatial driver matching (H3 hexagonal index), dynamic surge pricing, trip management (state machine), and payment processing (async, idempotent).
Key tool/command: h3-js library for hexagonal geospatial indexing — converts lat/lng to cell IDs for O(1) nearest-driver lookups.
Watch out for: Storing driver locations in a relational DB — at scale, you need an in-memory geospatial index (Redis GEO or custom H3 grid) updated every 3-4 seconds per driver.
Works with: Any cloud provider (AWS/GCP/Azure), any language stack; core patterns are language-agnostic.

Constraints

GPS accuracy is 3-10m in urban canyons — always map-snap coordinates to the road network before matching or ETA calculation
Driver location updates at scale (>10K concurrent drivers) must go through a message queue (Kafka/Pulsar), not direct DB writes
Surge pricing multipliers are regulated in multiple jurisdictions (NYC caps at 2x during emergencies, EU has transparency requirements) — always implement configurable caps
Payment charges must be idempotent — network retries must not double-charge riders; use idempotency keys
WebSocket-based real-time connections require sticky sessions or a connection registry service — stateless horizontal scaling will silently drop connections

Quick Reference

Component	Role	Technology Options	Scaling Strategy
API Gateway	Rate limiting, auth, request routing	Kong, AWS API Gateway, Envoy	Horizontal + edge caching
Rider Service	Ride requests, fare estimates, trip history	Node.js, Go, Java Spring	Stateless horizontal pods
Driver Service	Driver onboarding, availability, earnings	Go, Java, Kotlin	Stateless horizontal pods
Location Service	Ingest GPS pings, maintain driver positions	Go + Redis GEO, custom H3 grid	Sharded by geo-region
Matching Service	Pair riders with nearest available drivers	Go, Rust, Java	Sharded by H3 cell region
Trip Service	Trip state machine (requested → matched → in_progress → completed)	Node.js, Go	Event-sourced with Kafka
Pricing Service	Fare calculation, surge pricing, promotions	Python, Go	Stateless; reads supply/demand from cache
Payment Service	Charge riders, pay drivers, handle refunds	Java, Node.js + Stripe/Braintree	Async processing with idempotency
Notification Service	Push notifications, SMS, email	Node.js, Go + Firebase/APNs/SNS	Fan-out via message queue
ETA Service	Estimated time of arrival, route optimization	Python, C++ + OSRM/Valhalla	Precomputed graph + ML model
WebSocket Gateway	Persistent connections for real-time updates	Node.js (Socket.io), Go (gorilla/websocket)	Sticky sessions + connection registry
Message Queue	Async event bus for all services	Apache Kafka, Apache Pulsar	Partitioned by driver_id or trip_id
Geospatial Index	Fast nearest-neighbor driver lookup	Redis GEO, H3 in-memory grid, PostGIS	Sharded by H3 resolution-3 cells
Analytics Pipeline	Trip data, driver metrics, business intelligence	Kafka → Spark/Flink → data warehouse	Batch + real-time (Lambda architecture)

Decision Tree

START
├── Expected scale?
│   ├── <1K concurrent users (MVP)
│   │   ├── Use monolith with PostGIS for location queries
│   │   ├── Simple distance-based matching (SQL query)
│   │   └── Fixed pricing — skip surge entirely
│   ├── 1K-100K concurrent users (city-level)
│   │   ├── Split into 5-8 microservices
│   │   ├── Redis GEO for driver locations
│   │   ├── Kafka for event streaming
│   │   └── Basic supply/demand surge pricing
│   ├── 100K-1M concurrent users (regional)
│   │   ├── Full microservices (12+ services)
│   │   ├── H3 hexagonal index, sharded by region
│   │   ├── Dedicated matching service with scoring algorithm
│   │   └── ML-based surge pricing + ETA prediction
│   └── >1M concurrent users (global, Uber-scale)
│       ├── Geo-sharded infrastructure (multi-region)
│       ├── Custom geospatial engine (RingPop for consistent hashing)
│       ├── Real-time ML pipeline for matching/pricing/ETA
│       └── CQRS + event sourcing for trip state
├── Matching priority?
│   ├── Lowest wait time → Nearest-driver with availability check
│   ├── Cost optimization → Factor in driver heading direction + ETA
│   └── Quality → Weighted scoring: distance (40%) + rating (30%) + acceptance rate (30%)
└── Pricing model?
    ├── Fixed fare → Precomputed zone-to-zone matrix
    ├── Metered → Distance (GPS trace) + time (wall clock) + base fare
    └── Dynamic surge → Supply/demand ratio per H3 cell, updated every 30-60s

Step-by-Step Guide

1. Define the data model and core entities

Design your database schema around four core entities: Users (riders + drivers), Vehicles, Trips, and Payments. Use PostgreSQL for transactional data and Redis for real-time state. [src2]

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    role VARCHAR(10) NOT NULL CHECK (role IN ('rider', 'driver')),
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    phone VARCHAR(20) UNIQUE NOT NULL,
    rating DECIMAL(3,2) DEFAULT 5.00,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE trips (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    rider_id UUID REFERENCES users(id),
    driver_id UUID REFERENCES users(id),
    status VARCHAR(20) NOT NULL DEFAULT 'requested',
    pickup_lat DOUBLE PRECISION NOT NULL,
    pickup_lng DOUBLE PRECISION NOT NULL,
    dropoff_lat DOUBLE PRECISION NOT NULL,
    dropoff_lng DOUBLE PRECISION NOT NULL,
    fare_cents INT,
    surge_multiplier DECIMAL(3,2) DEFAULT 1.00,
    requested_at TIMESTAMPTZ DEFAULT NOW(),
    completed_at TIMESTAMPTZ
);

Verify: SELECT count(*) FROM information_schema.tables WHERE table_name IN ('users', 'trips'); → expected: 2

2. Implement the location ingestion pipeline

Drivers send GPS pings every 3-4 seconds over WebSocket. These flow through Kafka to the Location Service, which updates an in-memory geospatial index for real-time matching. [src3]

Driver App --[WebSocket]--> WebSocket Gateway
    --[produce]--> Kafka (topic: driver.location)
    --[consume]--> Location Service
        ├── Hot path: Update Redis GEO / H3 in-memory index
        └── Cold path: Write to TimescaleDB for trip trace history

Verify: Send a test location ping and confirm Redis GEOSEARCH returns the driver within the expected radius.

3. Build the geospatial matching engine

Use Uber's H3 hexagonal grid to partition the map. Convert the pickup location to an H3 cell ID, then search that cell and its k-ring neighbors for available drivers. [src1]

import h3

def find_nearby_drivers(pickup_lat, pickup_lng, driver_index, k=1):
    pickup_cell = h3.latlng_to_cell(pickup_lat, pickup_lng, 9)
    search_cells = h3.grid_disk(pickup_cell, k)
    nearby_drivers = []
    for cell in search_cells:
        if cell in driver_index:
            nearby_drivers.extend(driver_index[cell])
    return nearby_drivers

Verify: h3.latlng_to_cell(40.7128, -74.0060, 9) returns a valid 15-character hex string.

4. Implement the ride matching algorithm

Score candidate drivers using a weighted function of distance, ETA, rating, and acceptance rate. Send the offer to the top-ranked driver; cascade on decline/timeout. [src2]

def score_driver(driver, pickup_lat, pickup_lng):
    dist = haversine(pickup_lat, pickup_lng, driver['lat'], driver['lng'])
    distance_score = max(0, (5.0 - dist) / 5.0) * 40
    rating_score = (driver['rating'] / 5.0) * 30
    accept_score = driver['acceptance_rate'] * 20
    heading_bonus = 10 if driver.get('heading_toward') else 0
    return distance_score + rating_score + accept_score + heading_bonus

Verify: A driver 0.1km away with 4.8 rating and 92% acceptance scores > 80.

5. Build the surge pricing engine

Calculate surge multipliers per H3 cell by comparing demand to supply. Update every 30-60 seconds. [src6]

def calculate_surge(cell_id, request_count, available_drivers,
                    base_threshold=0.7, max_multiplier=3.0):
    if available_drivers == 0:
        return max_multiplier
    ratio = request_count / available_drivers
    if ratio <= base_threshold:
        return 1.0
    surge = 1.0 + (ratio - base_threshold) * 1.5
    return min(round(surge, 2), max_multiplier)

Verify: calculate_surge("cell", 10, 20) → 1.0; calculate_surge("cell", 50, 10) → 3.0

6. Implement the trip state machine

Model each trip as a finite state machine with event sourcing via Kafka. [src4]

REQUESTED ──[driver accepts]──> MATCHED
MATCHED ──[driver arrives]──> DRIVER_ARRIVING
DRIVER_ARRIVING ──[rider picked up]──> IN_PROGRESS
IN_PROGRESS ──[arrived at destination]──> COMPLETED
COMPLETED ──[payment processed]──> PAID

Verify: No valid transition exists from COMPLETED back to IN_PROGRESS.

7. Set up the payment and billing pipeline

Process payments asynchronously after trip completion. Use idempotency keys to prevent double charges. [src5]

Trip Completed Event (Kafka) --> Payment Service
  1. Calculate final fare: base + (distance_km * per_km) + (duration_min * per_min) * surge
  2. Capture pre-authorized amount (Stripe/Braintree)
  3. If capture fails: retry with exponential backoff (max 3)
  4. Emit PaymentCompleted event

Verify: Two identical capture requests with same idempotency key → only one charge.

Code Examples

Python: Geospatial Driver Matching with H3

# Input:  rider pickup coordinates, dict of active drivers
# Output: ranked list of nearby driver IDs

import h3
import redis

r = redis.Redis(host='localhost', port=6379, db=0)

def update_driver_location(driver_id: str, lat: float, lng: float):
    r.geoadd("drivers:geo", (lng, lat, driver_id))
    cell = h3.latlng_to_cell(lat, lng, 9)
    r.sadd(f"drivers:h3:{cell}", driver_id)
    r.set(f"drivers:cell:{driver_id}", cell, ex=30)

def match_rider(pickup_lat: float, pickup_lng: float, radius_km: float = 3.0):
    results = r.geosearch(
        "drivers:geo", longitude=pickup_lng, latitude=pickup_lat,
        radius=radius_km, unit="km", sort="ASC", count=10
    )
    return [driver_id.decode() for driver_id in results]

JavaScript/Node.js: WebSocket Driver Connection Handler

// Input:  WebSocket connection from driver app
// Output: location updates forwarded to Kafka

const { Kafka } = require('kafkajs');
const WebSocket = require('ws');

const kafka = new Kafka({ brokers: ['kafka:9092'] });
const producer = kafka.producer();
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws, req) => {
  const driverId = req.headers['x-driver-id'];
  ws.on('message', async (data) => {
    const { lat, lng, timestamp } = JSON.parse(data);
    await producer.send({
      topic: 'driver.location',
      messages: [{ key: driverId, value: JSON.stringify({ lat, lng, timestamp }) }]
    });
  });
});

Go: Surge Pricing Calculator

// Input:  demand (request count), supply (driver count)
// Output: surge multiplier (float64)

package pricing

import "math"

const (
    BaseThreshold = 0.7
    MaxMultiplier = 3.0
    SurgeSlope    = 1.5
)

func CalculateSurge(requests, drivers int) float64 {
    if drivers == 0 {
        return MaxMultiplier
    }
    ratio := float64(requests) / float64(drivers)
    if ratio <= BaseThreshold {
        return 1.0
    }
    surge := 1.0 + (ratio-BaseThreshold)*SurgeSlope
    return math.Min(math.Round(surge*100)/100, MaxMultiplier)
}

Anti-Patterns

Wrong: Polling all drivers in the database for every ride request

-- BAD: full table scan on every ride request, O(n) for n drivers
SELECT id, lat, lng,
       ST_Distance(location, ST_MakePoint(-74.006, 40.7128)) AS dist
FROM drivers
WHERE is_available = true
ORDER BY dist ASC
LIMIT 10;
-- At 100K+ active drivers this query takes 500ms+ per request

Correct: Use in-memory geospatial index with H3 partitioning

# GOOD: O(1) cell lookup + O(k) for k neighbors, typically <5ms
pickup_cell = h3.latlng_to_cell(40.7128, -74.006, 9)
search_cells = h3.grid_disk(pickup_cell, 1)  # ~7 cells
candidates = []
for cell in search_cells:
    candidates.extend(driver_index.get(cell, []))

Wrong: Synchronous payment processing blocking trip completion

# BAD: rider waits for payment before seeing "trip complete"
def complete_trip(trip_id):
    trip = db.get_trip(trip_id)
    trip.status = 'completed'
    charge = stripe.charges.create(amount=trip.fare)  # 2-5s blocking call
    if charge.status != 'succeeded':
        trip.status = 'payment_failed'  # rider stuck
    db.save(trip)

Correct: Async payment via event queue with retry

# GOOD: trip completes instantly, payment processed asynchronously
def complete_trip(trip_id):
    trip = db.get_trip(trip_id)
    trip.status = 'completed'
    db.save(trip)
    kafka.produce('trip.completed', {
        'trip_id': trip_id, 'fare': trip.fare,
        'idempotency_key': f"trip-{trip_id}"
    })

Wrong: Single-point surge pricing for an entire city

# BAD: one surge multiplier for the whole city
total_requests = get_city_wide_requests()
total_drivers = get_city_wide_drivers()
city_surge = total_requests / total_drivers  # meaningless average

Correct: Per-cell surge pricing using H3 hexagonal grid

# GOOD: granular surge per H3 cell captures local imbalance
for cell_id in active_cells:
    requests = get_cell_requests(cell_id, window_minutes=5)
    drivers = get_cell_drivers(cell_id)
    surge = calculate_surge(cell_id, requests, drivers)
    cache.set(f"surge:{cell_id}", surge, ttl=60)

Common Pitfalls

Stale driver locations: Drivers that close the app remain in the index as "available." Fix: Set a TTL on driver location entries (30s) and require periodic heartbeats. [src3]
GPS drift in tunnels/garages: GPS signal loss causes drivers to "teleport" when signal returns, triggering false surge spikes. Fix: Apply a Kalman filter or speed-based sanity check — discard jumps > 200km/h. [src1]
Hot-spot matching storms: A concert ending creates 10K requests in one H3 cell. Fix: Expand the search radius dynamically and implement a request queue with ETA-based prioritization. [src4]
Payment authorization expiry: Pre-authorized holds expire after 7 days. Fix: Capture within 24 hours; for disputes, create a new charge. [src5]
Cascading driver rejection: Re-running full matching on each decline wastes time. Fix: Pre-rank the top 5 candidates and cascade through with 10-15s timeouts. [src2]
Inconsistent trip state: Rider app shows "matched" but driver shows "cancelled." Fix: Use a single source of truth (Trip Service with event sourcing). [src5]
ETA inaccuracy from Euclidean distance: Straight-line distance / avg speed ignores road networks. Fix: Use a road-graph routing engine (OSRM, Valhalla). [src4]
WebSocket connection leaks: Failed connections exhaust server file descriptors. Fix: Implement ping/pong health checks every 30s and automatic cleanup. [src5]

Diagnostic Commands

# Check Redis GEO driver count in radius
redis-cli GEOSEARCH drivers:geo FROMLONLAT -74.006 40.7128 BYRADIUS 3 km COUNT 100 ASC

# Monitor Kafka consumer lag for location topic
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --describe --group location-consumer

# Count active WebSocket connections
ss -s | grep -i estab

# Check H3 cell for a coordinate
python3 -c "import h3; print(h3.latlng_to_cell(40.7128, -74.006, 9))"

# Monitor trip state transitions (Kafka)
kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic trip.events --from-latest

# Check surge pricing for a cell
redis-cli GET surge:891f1a80537ffff

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Building a two-sided marketplace connecting riders with drivers in real-time	Building a food delivery platform with restaurant prep time	Food delivery system design (different matching + batching)
Need to handle >1K concurrent ride requests with <3s matching latency	Simple point-to-point scheduled shuttle service	Queue-based booking system
Implementing dynamic pricing based on real-time supply/demand	Fixed-route public transit scheduling	Transit scheduling system (GTFS-based)
Require real-time driver tracking and ETA updates for riders	Package/freight logistics with multi-day delivery windows	Logistics/fleet management system design
Supporting multiple vehicle types (economy, premium, XL) in one platform	Peer-to-peer car rental (no real-time matching needed)	Marketplace platform design (Airbnb-style)

Important Caveats

The H3 resolution level matters significantly: resolution 9 (~174m edge) is suitable for urban areas, but rural matching may need resolution 7 (~5.2km edge) to find enough drivers
Surge pricing algorithms are under active regulatory scrutiny — California AB5, EU Digital Services Act, and NYC TLC all impose constraints; always build in jurisdiction-specific caps and transparency mechanisms
Driver matching at Uber-scale (>1M concurrent) uses ML ranking models trained on acceptance/completion data — the weighted scoring shown here is a solid starting point but production systems evolve toward learned models
WebSocket at scale is expensive — at >100K concurrent drivers, consider a connection broker (e.g., RingPop, NATS) rather than a monolithic WebSocket server
Payment processing varies dramatically by country — design the payment service with pluggable provider adapters from day one