SELECT create_distributed_table('orders', 'tenant_id'); (Citus) or PARTITION BY HASH (Vitess vschema)| Strategy | How It Works | Distribution | Range Queries | Resharding Cost | Best For | Worst For |
|---|---|---|---|---|---|---|
| Hash-based | shard = hash(key) % N |
Uniform (even) | Poor (scatter-gather) | High (rehash all data) | High-write workloads, random access by ID | Time-series, range scans |
| Consistent hashing | Hash ring with virtual nodes | Uniform | Poor | Low (only K/N keys move) | Elastic scaling, frequent shard add/remove | Range queries, ordered access |
| Range-based | shard = key_range(value) |
Depends on data | Excellent (single shard) | Medium (split/merge ranges) | Time-series, ordered data, date partitions | Monotonic keys (hot last shard) |
| Geo-based | shard = region(location) |
Uneven (population) | Good within region | Low (region reassignment) | Multi-region apps, data residency compliance | Global queries, small datasets |
| Directory-based | Lookup table maps key to shard | Configurable | Depends on mapping | Low (update directory) | Complex routing, tenant isolation | High throughput (directory = bottleneck) |
| Hash + range (compound) | Hash on tenant, range on time | Balanced | Good within tenant | Medium | Multi-tenant SaaS with time-series | Simple applications |
| Schema-based | Each tenant gets own schema | Isolated | Full SQL within tenant | Low (move schema) | Strong tenant isolation, compliance | Large tenant counts (>10K schemas) |
| Algorithmic (jump hash) | jump_consistent_hash(key, N) |
Near-perfect | Poor | Very low (minimal movement) | Caching layers, stateless routing | Dynamic shard topology |
START -- What is your primary access pattern?
|
+-- Single-entity lookup (user_id, order_id, tenant_id)?
| +-- Need elastic scaling (frequently add/remove shards)?
| | +-- YES --> Consistent hashing (virtual nodes)
| | +-- NO --> Hash-based sharding (simple modulo or jump hash)
| |
+-- Range scans (time-series, date ranges, sequential IDs)?
| +-- Data is append-only (logs, events, IoT)?
| | +-- YES --> Range-based with time buckets (e.g., monthly shards)
| | +-- NO --> Range-based with dynamic range splitting
| |
+-- Geographic queries (user location, data residency)?
| +-- YES --> Geo-based sharding (shard per region/country)
| |
+-- Multi-tenant SaaS with mixed access patterns?
| +-- <1K tenants with strong isolation needs?
| | +-- YES --> Schema-based sharding (Citus schema-based)
| +-- >1K tenants?
| +-- YES --> Hash on tenant_id (co-locate all tenant data)
|
+-- Unknown or complex routing needs?
+-- YES --> Directory-based (lookup table with flexibility)
+-- DEFAULT --> Start with hash-based on most-queried entity ID
Analyze your query patterns to find the column that appears in the WHERE clause of 80%+ of your queries. The shard key must have high cardinality (thousands+ distinct values) and appear in most queries to avoid scatter-gather operations. [src7]
-- PostgreSQL: Find most common WHERE clause columns
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC LIMIT 20;
Verify: The top 5 queries all include your candidate shard key in their WHERE clause.
A good shard key has high cardinality, high frequency in queries, and low monotonicity. Test distribution before committing. [src3]
-- Check cardinality: should be >10x your planned shard count
SELECT COUNT(DISTINCT tenant_id) AS cardinality FROM orders;
-- Check distribution: no single value should exceed 5% of total
SELECT tenant_id, COUNT(*) AS row_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
FROM orders GROUP BY tenant_id ORDER BY row_count DESC LIMIT 20;
Verify: cardinality > 10 * planned_shards and max(pct) < 5%.
Based on the decision tree, select hash, range, geo, or directory-based sharding. For most OLTP applications, hash-based on tenant_id or user_id is the safest default. [src6]
Verify: Your chosen strategy handles your top 5 queries without cross-shard operations.
Deploy using a proven sharding middleware or native distributed database rather than building custom sharding logic. [src1]
-- PostgreSQL + Citus (hash distribution)
SELECT create_distributed_table('orders', 'tenant_id');
SELECT create_distributed_table('order_items', 'tenant_id',
colocate_with => 'orders');
Verify: SELECT COUNT(*) FROM citus_shards; shows data distributed across all nodes.
Monitor shard distribution, query routing, and hotspots. Set up alerts for shard imbalance exceeding 20%. [src5]
-- Citus: Check shard sizes and distribution
SELECT nodename, COUNT(*) AS shard_count,
pg_size_pretty(SUM(shard_size)) AS total_size
FROM citus_shards GROUP BY nodename;
Verify: Shard sizes are within 20% of each other. No single shard handles >30% of total queries.
# Input: List of shard nodes + data keys to route
# Output: Shard assignment for each key with minimal disruption on node changes
import hashlib
from bisect import bisect_right
class ConsistentHashRing:
def __init__(self, nodes=None, virtual_nodes=150):
self.ring, self.sorted_keys = {}, []
self.virtual_nodes = virtual_nodes
for node in (nodes or []): self.add_node(node)
def _hash(self, key): return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node):
for i in range(self.virtual_nodes):
h = self._hash(f"{node}:vn{i}")
self.ring[h] = node; self.sorted_keys.append(h)
self.sorted_keys.sort()
Full script: python-consistent-hashing-ring-implementation.py (42 lines)
// Vitess vschema -- defines sharding strategy for MySQL
{
"sharded": true,
"vindexes": { "hash_tenant": { "type": "hash" } },
"tables": {
"orders": {
"column_vindexes": [{ "column": "tenant_id", "name": "hash_tenant" }]
}
}
}
Full config: vschema-configuration.json (31 lines)
// Input: Shard key (tenant_id) + database connections
// Output: Routes queries to correct shard based on hash
func (r *ShardRouter) GetShard(tenantID string) *sql.DB {
h := fnv.New32a()
h.Write([]byte(tenantID))
idx := h.Sum32() % r.count
return r.shards[idx]
}
Full script: go-application-level-shard-router.go (34 lines)
-- BAD -- Modulo on auto-increment creates hot shard for recent data.
-- All new inserts go to shard = (max_id + 1) % N, hammering one shard.
INSERT INTO orders_shard_%s VALUES (...); -- shard = order_id % 4
-- GOOD -- Hash distributes evenly regardless of key ordering.
-- Adding a 5th shard moves only ~20% of data (1/N).
SELECT create_distributed_table('orders', 'tenant_id'); -- Citus handles hashing
# BAD -- country_code has ~200 values. With 16 shards, some shards
# get USA (40% of data) while others get Liechtenstein (0.001%).
shard = hash(row["country_code"]) % 16 # massive skew
# GOOD -- Shard by tenant_id (millions of values = even distribution).
# Handle geo-routing in the application layer, not the shard key.
shard = consistent_hash(row["tenant_id"])
cluster = geo_directory[row["region"]]
-- BAD -- This JOIN hits every shard, aggregates at coordinator.
-- 16 shards = 16 network round trips + merge. Latency: 50-500ms.
SELECT o.*, u.name FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE o.created_at > NOW() - INTERVAL '1 hour';
-- GOOD -- Both tables sharded by tenant_id = JOIN stays on one shard.
SELECT create_distributed_table('orders', 'tenant_id');
SELECT create_distributed_table('users', 'tenant_id',
colocate_with => 'orders');
-- Query stays on one shard:
SELECT o.*, u.name FROM orders o
JOIN users u ON o.tenant_id = u.tenant_id AND o.user_id = u.user_id
WHERE o.tenant_id = 'abc123';
# BAD -- Homegrown shard routing = bugs, no rebalancing, no failover.
def get_connection(user_id):
shard_id = hash(user_id) % len(shards)
return psycopg2.connect(shards[shard_id])
# What about transactions? Failover? Schema migrations?
# GOOD -- Let Citus, Vitess, or ProxySQL handle routing, rebalancing,
# failover, and schema migrations. You write normal SQL.
conn = psycopg2.connect("host=citus-coordinator dbname=myapp")
cur = conn.cursor()
cur.execute("SELECT * FROM orders WHERE tenant_id = %s", ("abc123",))
rebalance_table_shards() handles this automatically. [src3]ApplySchema, Citus DDL propagation). [src1]# PostgreSQL + Citus: Check shard distribution
psql -c "SELECT nodename, COUNT(*) AS shards, pg_size_pretty(SUM(shard_size))
FROM citus_shards GROUP BY nodename;"
# Citus: Find scatter-gather queries (hitting multiple shards)
psql -c "SELECT query, calls, mean_exec_time
FROM citus_stat_statements
WHERE partition_count > 1 ORDER BY calls DESC LIMIT 10;"
# MongoDB: Check balancer status and chunk distribution
mongosh --eval 'sh.status()'
mongosh --eval 'db.adminCommand({ balancerStatus: 1 })'
# Vitess: List tablets and their health
vtctlclient ListAllTablets | grep -v healthy
# General: Check shard key cardinality
psql -c "SELECT COUNT(DISTINCT tenant_id) AS cardinality,
COUNT(*) AS total_rows FROM orders;"
| Technology | Version | Sharding Features | Notes |
|---|---|---|---|
| PostgreSQL + Citus | 13.0 (2025) | Schema-based sharding, shard rebalancer, CDC | Citus 12 introduced schema-based sharding (2023) |
| Vitess | 21.x (2025) | VReplication, online resharding, VDiff | Used by Slack, GitHub, PlanetScale. CNCF graduated |
| MongoDB | 8.0 (2024) | Hashed + ranged sharding, zone sharding | Online resharding added in 6.0 (2022) |
| CockroachDB | 24.x (2025) | Automatic range splitting, hash-sharded indexes | Transparent sharding -- no manual shard management |
| TiDB | 8.x (2025) | Auto-sharding, TiFlash columnar, placement rules | MySQL-compatible, auto-splits at 96MB regions |
| YugabyteDB | 2024.x | Hash + range sharding, tablespace geo-partitioning | PostgreSQL-compatible, automatic tablet splitting |
| Cassandra | 5.0 (2024) | Consistent hashing (Murmur3), virtual nodes | Always sharded -- no single-node mode |
| ScyllaDB | 6.x (2025) | Tablets (replacing vnodes), Raft-based topology | Cassandra-compatible, 10x lower latency |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Single database exceeds 1-5TB and vertical scaling is maxed | Database is <500GB with room to scale vertically | Add more RAM/CPU, optimize queries, add read replicas |
| Write throughput exceeds single-node capacity (>50K writes/sec) | Read-heavy workload with low write volume | Read replicas with connection routing |
| Data residency requirements mandate geographic isolation | All users are in one region | Single-region deployment with edge caching |
| Multi-tenant SaaS needs tenant-level isolation and scaling | <100 tenants with similar data volumes | Schema-per-tenant or row-level security on single DB |
| Need to scale horizontally beyond single-machine limits | Application can tolerate latency from unoptimized indexes | Index optimization and query tuning first |
| Compliance requires data in specific jurisdictions (GDPR, CCPA) | No regulatory data residency requirements | Single deployment with encryption at rest |