ALTER TABLE tenants ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ...SET app.current_tenant -- use session mode or pass tenant context per query| Component | Role | Technology Options | Scaling Strategy |
|---|---|---|---|
| API Gateway | Route requests, inject tenant context | Kong, AWS API Gateway, Envoy, Nginx | Horizontal (stateless) |
| Auth / Identity | Authenticate users, issue tenant-scoped JWTs | Auth0, AWS Cognito, Keycloak, Azure AD B2C | Managed service / federation |
| Tenant Resolver | Extract tenant_id from JWT/subdomain/header | Custom middleware (Express, FastAPI, Spring) | Stateless, per-instance |
| Application Layer | Business logic with tenant-scoped data access | Node.js, Python, Go, Java microservices | Horizontal auto-scaling |
| Shared Database (Pool) | Single DB, tenant_id column + RLS | PostgreSQL + RLS, Azure SQL elastic pools | Vertical + read replicas |
| Schema-per-Tenant | Separate schema per tenant, shared DB server | PostgreSQL schemas, MySQL databases | Vertical + schema sharding |
| Database-per-Tenant (Silo) | Dedicated DB instance per tenant | RDS per tenant, Aurora, Azure SQL per DB | Independent scaling per tenant |
| Caching Layer | Tenant-prefixed keys, prevent cross-tenant leaks | Redis, Memcached, ElastiCache | Cluster mode with key namespacing |
| Background Jobs | Tenant-aware queue processing, fair scheduling | SQS + tenant tags, Celery, BullMQ | Per-tenant rate limiting |
| Billing / Metering | Track per-tenant usage for tiered pricing | Stripe, AWS Marketplace, custom metering | Event-driven aggregation |
| Tenant Onboarding | Provision resources, seed data, configure DNS | Custom pipeline (Terraform, Pulumi, scripts) | Automated, idempotent |
| Monitoring | Per-tenant metrics, noisy neighbor detection | Datadog, Prometheus + tenant labels, CloudWatch | Tag-based tenant dashboards |
| Config / Feature Flags | Per-tenant feature toggles, tier-based limits | LaunchDarkly, Unleash, custom tenant_config table | Cached, per-request lookup |
START: How many tenants and what isolation level?
|
+-- Expected tenants > 10,000?
| +-- YES --> Shared database + RLS (pool model)
| | Cost-effective, simplest ops, use tenant_id column everywhere
| +-- NO |
| v
+-- Compliance requires dedicated data stores (HIPAA, FedRAMP, data residency)?
| +-- YES --> Database-per-tenant (silo model)
| | Maximum isolation, per-tenant backup/restore, highest cost
| +-- NO |
| v
+-- Tenants < 5,000 and want schema-level isolation without separate DBs?
| +-- YES --> Schema-per-tenant (bridge model)
| | Good isolation, shared infra, watch for catalog bloat
| +-- NO |
| v
+-- Mixed tenant sizes (SMB + enterprise)?
| +-- YES --> Hybrid: pool for SMB tenants, silo for enterprise tenants
| | Route based on tenant tier; automate promotion workflow
| +-- NO |
| v
+-- DEFAULT --> Start with shared database + RLS
Easiest to build, migrate to hybrid when needed
Evaluate your requirements across four dimensions: security/compliance, tenant count, operational complexity, and cost. The three fundamental models are pool (shared everything), bridge (schema-per-tenant), and silo (database-per-tenant). Most teams should start with pool + RLS and create a promotion path to silo for enterprise customers. [src1]
Verify: Document your choice in an Architecture Decision Record (ADR) with trade-offs listed.
Every authenticated request must carry a trusted tenant_id. Embed it as a custom claim in JWTs during login. Never accept tenant_id from request headers or URL parameters without validation against the token. [src7]
# Python / FastAPI -- extract tenant from JWT
from fastapi import Request, HTTPException
from jose import jwt, JWTError
async def get_tenant_id(request: Request) -> str:
token = request.headers.get("Authorization", "").replace("Bearer ", "")
try:
payload = jwt.decode(token, PUBLIC_KEY, algorithms=["RS256"])
tenant_id = payload.get("tenant_id")
if not tenant_id:
raise HTTPException(status_code=403, detail="Missing tenant claim")
return tenant_id
except JWTError:
raise HTTPException(status_code=401, detail="Invalid token")
Verify: Decode a sample JWT and confirm tenant_id claim is present.
Create middleware that runs on every request, extracts tenant_id, and makes it available to all downstream code. This is the single most critical piece -- if any route bypasses this middleware, you have a data leak. [src1] [src7]
// Node.js / Express -- tenant context middleware
const { AsyncLocalStorage } = require('node:async_hooks');
const tenantStore = new AsyncLocalStorage();
function tenantMiddleware(req, res, next) {
const tenantId = req.auth?.tenant_id;
if (!tenantId) return res.status(403).json({ error: 'No tenant context' });
tenantStore.run({ tenantId }, () => next());
}
function getCurrentTenant() {
return tenantStore.getStore()?.tenantId;
}
module.exports = { tenantMiddleware, getCurrentTenant, tenantStore };
Verify: Add a test route that returns getCurrentTenant() -- confirm it matches the JWT's tenant_id.
For the shared-database model, enable PostgreSQL Row-Level Security so the database itself enforces tenant boundaries. This is your safety net -- even if application code has a bug, the database will not return another tenant's data. [src3]
-- PostgreSQL RLS setup for multi-tenant table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Set tenant context per request (call from app middleware)
SET LOCAL app.current_tenant = 'tenant-uuid-here';
Verify: SELECT * FROM orders; as a role with RLS active but no app.current_tenant set -- should return 0 rows.
Every database connection must set the tenant context before executing queries. With connection pooling, set it at the start of each transaction, not per-connection. [src3]
# Python / SQLAlchemy -- set tenant context per request
from sqlalchemy import event, text
from contextvars import ContextVar
current_tenant: ContextVar[str] = ContextVar('current_tenant')
@event.listens_for(engine, "after_begin")
def set_tenant_context(session, transaction, connection):
tenant_id = current_tenant.get(None)
if tenant_id:
connection.execute(text(
"SET LOCAL app.current_tenant = :tid"
), {"tid": tenant_id})
Verify: Run SHOW app.current_tenant; inside a test transaction -- should show the expected tenant UUID.
Add per-tenant rate limiting, resource quotas, and monitoring. Without these, a single tenant's traffic spike degrades all tenants. [src4] [src5]
# Per-tenant rate limiting with Redis
import redis, time
r = redis.Redis()
def check_rate_limit(tenant_id: str, max_rpm: int = 1000) -> bool:
key = f"ratelimit:{tenant_id}:{int(time.time()) // 60}"
count = r.incr(key)
if count == 1:
r.expire(key, 120)
return count <= max_rpm
Verify: Fire max_rpm + 1 requests for one tenant -- confirm the last request is rejected while another tenant's requests succeed.
Tag all metrics and logs with tenant_id. Create dashboards that show per-tenant resource consumption, error rates, and latency percentiles. Alert on tenants exceeding 2x the median resource usage. [src4]
Verify: Query your monitoring system for a specific tenant_id -- confirm metrics are segmented correctly.
# Input: HTTP request with Bearer JWT containing tenant_id claim
# Output: tenant_id available via ContextVar for all downstream code
from contextvars import ContextVar
from fastapi import FastAPI, Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
tenant_ctx: ContextVar[str] = ContextVar("tenant_ctx", default="")
class TenantMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
tid = getattr(request.state, "tenant_id", None)
if not tid:
raise HTTPException(403, "Missing tenant context")
token = tenant_ctx.set(tid)
response = await call_next(request)
tenant_ctx.reset(token)
return response
// Input: Express request with req.auth.tenant_id set by JWT middleware
// Output: query results scoped to the authenticated tenant only
const { Pool } = require('pg');
const pool = new Pool();
async function tenantQuery(tenantId, sql, params = []) {
const client = await pool.connect();
try {
await client.query('BEGIN');
await client.query("SET LOCAL app.current_tenant = $1", [tenantId]);
const result = await client.query(sql, params);
await client.query('COMMIT');
return result.rows;
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
-- Input: Multi-tenant orders table
-- Output: Row-level isolation enforced by the database engine
-- 1. Add tenant column and index
ALTER TABLE orders ADD COLUMN tenant_id uuid NOT NULL;
CREATE INDEX idx_orders_tenant ON orders (tenant_id);
-- 2. Enable and force RLS
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY;
-- 3. Create policies for each operation
CREATE POLICY tenant_select ON orders FOR SELECT
USING (tenant_id = current_setting('app.current_tenant')::uuid);
CREATE POLICY tenant_insert ON orders FOR INSERT
WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);
CREATE POLICY tenant_update ON orders FOR UPDATE
USING (tenant_id = current_setting('app.current_tenant')::uuid)
WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);
CREATE POLICY tenant_delete ON orders FOR DELETE
USING (tenant_id = current_setting('app.current_tenant')::uuid);
# BAD -- tenant_id from request body, not from verified JWT
@app.post("/orders")
async def create_order(request: Request):
body = await request.json()
tenant_id = body["tenant_id"] # attacker can set any tenant_id
db.execute("INSERT INTO orders (tenant_id, ...) VALUES (%s, ...)", tenant_id)
# GOOD -- tenant_id from server-verified JWT claim
@app.post("/orders")
async def create_order(request: Request):
tenant_id = request.state.tenant_id # set by auth middleware from JWT
db.execute("INSERT INTO orders (tenant_id, ...) VALUES (%s, ...)", tenant_id)
# BAD -- single missed WHERE clause = full data leak
def get_orders(tenant_id):
return db.execute("SELECT * FROM orders WHERE tenant_id = %s", tenant_id)
# What if another query forgets the WHERE clause?
# GOOD -- RLS enforces isolation even if app code is buggy
def get_orders():
# tenant_id filter applied automatically by PostgreSQL RLS policy
# SET LOCAL app.current_tenant was called by middleware
return db.execute("SELECT * FROM orders")
# BAD -- cache key collision between tenants
cache.set("user:123:profile", user_data) # which tenant's user 123?
# GOOD -- cache key includes tenant_id
cache.set(f"tenant:{tenant_id}:user:123:profile", user_data)
# BAD -- tenant context leaks between requests in a connection pool
conn = pool.get_connection()
conn.execute("SET app.current_tenant = %s", [tenant_a])
conn.execute("SELECT * FROM orders") # returns tenant_a's data
pool.return_connection(conn) # context still set to tenant_a!
# GOOD -- SET LOCAL scoped to transaction, auto-resets on COMMIT/ROLLBACK
conn = pool.get_connection()
conn.execute("BEGIN")
conn.execute("SET LOCAL app.current_tenant = %s", [tenant_b])
conn.execute("SELECT * FROM orders") # scoped to tenant_b
conn.execute("COMMIT") # context automatically cleared
pool.return_connection(conn)
SET LOCAL search_path within transactions, or use a pooler that supports schema switching. [src6]IF NOT EXISTS), run in parallel with rollback capability. [src6]SET app.current_tenant persists across transactions in transaction mode. Fix: use SET LOCAL (transaction-scoped) or switch to session pooling. [src3]-- Check RLS policies on a table
SELECT schemaname, tablename, policyname, permissive, roles, cmd, qual
FROM pg_policies WHERE tablename = 'orders';
-- Verify RLS is enabled and forced
SELECT relname, relrowsecurity, relforcerowsecurity
FROM pg_class WHERE relname = 'orders';
-- Check current tenant context in session
SHOW app.current_tenant;
-- Count rows per tenant (detect data skew / noisy tenants)
SELECT tenant_id, COUNT(*) as row_count
FROM orders GROUP BY tenant_id ORDER BY row_count DESC LIMIT 20;
-- Monitor active connections per tenant
SELECT usename, client_addr, COUNT(*) as conn_count
FROM pg_stat_activity WHERE state = 'active'
GROUP BY usename, client_addr ORDER BY conn_count DESC;
-- Check for missing tenant_id indexes
SELECT t.tablename, i.indexname
FROM pg_tables t
LEFT JOIN pg_indexes i ON t.tablename = i.tablename AND i.indexdef LIKE '%tenant_id%'
WHERE t.schemaname = 'public' AND i.indexname IS NULL
AND EXISTS (
SELECT 1 FROM information_schema.columns c
WHERE c.table_name = t.tablename AND c.column_name = 'tenant_id'
);
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Building a B2B SaaS product serving multiple organizations | Building a consumer app where each user is independent | Standard user-based auth with row-level permissions |
| Need to amortize infrastructure costs across many customers | Tenants require physically separate infrastructure (FedRAMP High) | Account-per-tenant with AWS Organizations / Azure Management Groups |
| Tenants share the same application version and feature set | Each customer needs heavy customization of business logic | Plugin architecture or customer-deployed instances |
| 100+ tenants with similar data volumes and usage patterns | Fewer than 5 large customers with vastly different SLAs | Dedicated deployment per customer |
| Offering tiered plans (free, pro, enterprise) with different limits | Real-time systems where microsecond latency matters | Direct table access with compile-time tenant verification |
SET LOCAL for tenant context -- you must add hooks/events to inject it, and these hooks must be tested rigorously for every query path.