serverless design patterns

- Bottom line: Design serverless apps as small, stateless, event-driven functions behind an API gateway, with managed services for state (databases, queues, object storage) and orchestration (Step Functions, Durable Functions) for complex workflows.

FaaS architecture best practices

- Bottom line: Design serverless apps as small, stateless, event-driven functions behind an API gateway, with managed services for state (databases, queues, object storage) and orchestration (Step Functions, Durable Functions) for complex workflows.

Lambda vs Cloud Functions vs Workers

- Bottom line: Design serverless apps as small, stateless, event-driven functions behind an API gateway, with managed services for state (databases, queues, object storage) and orchestration (Step Functions, Durable Functions) for complex workflows.

when to use serverless

- Bottom line: Design serverless apps as small, stateless, event-driven functions behind an API gateway, with managed services for state (databases, queues, object storage) and orchestration (Step Functions, Durable Functions) for complex workflows.

Serverless Application Architecture

How do I design a serverless application architecture?

TL;DR

Bottom line: Design serverless apps as small, stateless, event-driven functions behind an API gateway, with managed services for state (databases, queues, object storage) and orchestration (Step Functions, Durable Functions) for complex workflows.
Key tool/command: serverless deploy (Serverless Framework) or provider-native CLIs (aws lambda, wrangler deploy, gcloud functions deploy)
Watch out for: Monolithic Lambda functions that bundle all routes into one handler -- split by domain boundary instead.
Works with: AWS Lambda (Node.js, Python, Java, Go, .NET, Ruby), Cloudflare Workers (JS/TS/Wasm), Google Cloud Functions (Node.js, Python, Go, Java, .NET, PHP, Ruby), Azure Functions (C#, JS, Python, Java, PowerShell, TypeScript).

Constraints

Cold starts add 100ms-10s latency on first invocation depending on runtime and memory; provision concurrency or use SnapStart/edge runtimes for latency-sensitive paths
Execution time limits vary by provider: Lambda 15min, Cloud Functions 60min (2nd gen), Azure 10min default (unlimited on Premium), Workers 30s (standard) or 15min (Cron Triggers)
Payload size limits: Lambda 6MB sync / 256KB async, Workers 100MB, Cloud Functions 10MB HTTP
Stateless by design: never store session state in function memory between invocations; use external stores (DynamoDB, KV, Redis)
VPC-attached functions may incur additional cold start overhead; use VPC only when private resource access is required
As of August 2025, AWS bills for the Lambda INIT phase -- heavy initialization code now directly increases cost

Quick Reference

Component	Role	Technology Options	Scaling Strategy
API Gateway	Route HTTP requests to functions	AWS API Gateway, Cloudflare Workers routing, Azure API Management, Google Cloud Endpoints	Auto-scales per request; throttle via rate limits
Compute (FaaS)	Execute business logic	AWS Lambda, Cloudflare Workers, Google Cloud Functions, Azure Functions, Vercel Functions	Auto-scales to concurrency limit (Lambda: 1000 default, requestable to 10K+)
Event Bus	Decouple producers from consumers	Amazon EventBridge, Google Pub/Sub, Azure Event Grid, Cloudflare Queues	Partition-based; scales with event throughput
Message Queue	Buffer async workloads	Amazon SQS, Google Cloud Tasks, Azure Service Bus, Cloudflare Queues	Scales with queue depth; configure batch size per consumer
Orchestration	Coordinate multi-step workflows	AWS Step Functions, Azure Durable Functions, Google Workflows, Temporal (self-hosted)	Per-execution pricing; use for saga patterns and retries
Object Storage	Store files, static assets	Amazon S3, Google Cloud Storage, Azure Blob, Cloudflare R2	Unlimited; event triggers on upload (S3 notifications, GCS Pub/Sub)
Database	Persistent structured data	DynamoDB, Firestore, Azure Cosmos DB, PlanetScale, Supabase, Cloudflare D1	Auto-scales on-demand (DynamoDB), connection pooling critical for SQL
Cache / KV	Low-latency key-value lookups	ElastiCache, Cloudflare KV, Upstash Redis, Momento, Azure Cache for Redis	Edge-distributed (KV, Momento) or regional (ElastiCache)
CDN / Edge	Serve static content, edge compute	CloudFront, Cloudflare CDN, Azure CDN, Cloud CDN	Global PoPs; cache invalidation via TTL or purge API
Auth	Identity and access control	Amazon Cognito, Auth0, Firebase Auth, Azure AD B2C, Clerk	Token-based (JWT); validate at gateway or function level
Observability	Logs, metrics, traces	CloudWatch, Datadog, Cloudflare Logpush, Google Cloud Logging, OpenTelemetry	Structured logging; distributed tracing with X-Ray or Jaeger
CI/CD	Deployment pipeline	GitHub Actions, AWS SAM, Serverless Framework, SST, Pulumi, Terraform	Infrastructure-as-code; blue/green or canary deployments

Decision Tree

START
├── Need <50ms global latency (edge compute)?
│   ├── YES → Cloudflare Workers or Lambda@Edge / CloudFront Functions
│   └── NO ↓
├── Workload is event-driven (S3 upload, DB change, queue message)?
│   ├── YES → AWS Lambda + EventBridge, or GCP Cloud Functions + Pub/Sub
│   └── NO ↓
├── Need HTTP API with variable traffic (0 to 10K+ RPS)?
│   ├── YES → API Gateway + Lambda, or Cloudflare Workers (simpler routing)
│   └── NO ↓
├── Multi-step workflow with retries and compensation?
│   ├── YES → AWS Step Functions or Azure Durable Functions
│   └── NO ↓
├── Long-running job (>15 minutes)?
│   ├── YES → Use containers (Cloud Run, ECS Fargate, AKS) instead
│   └── NO ↓
├── Execution requires GPU?
│   ├── YES → Use dedicated instances or managed ML services
│   └── NO ↓
├── Need full OS/runtime control?
│   ├── YES → Use containers
│   └── NO ↓
└── DEFAULT → Standard serverless (Lambda, Cloud Functions, or Azure Functions)
    ├── <1K concurrent → Single-region, default concurrency
    ├── 1K-100K concurrent → Multi-region, provisioned concurrency
    └── >100K concurrent → Edge compute + regional fallback

Step-by-Step Guide

1. Define function boundaries by domain

Split functions along business domain boundaries, not by HTTP method. Each function should own one bounded context (e.g., "orders", "payments", "notifications"). Avoid the monolithic Lambda anti-pattern where one function handles all routes. [src1]

project/
├── functions/
│   ├── orders/
│   │   ├── create.js
│   │   ├── get.js
│   │   └── list.js
│   ├── payments/
│   │   ├── process.js
│   │   └── webhook.js
│   └── notifications/
│       ├── send-email.js
│       └── send-push.js
├── shared/
│   ├── db.js
│   └── auth.js
└── serverless.yml

Verify: Each function file imports only the dependencies it needs -> deployment package size < 5MB per function.

2. Configure the API gateway and routing

Map HTTP routes to individual functions. Use path-based routing with the API gateway or edge router. [src2]

# serverless.yml (Serverless Framework)
service: my-serverless-app
provider:
  name: aws
  runtime: nodejs20.x
  memorySize: 512
  timeout: 29

functions:
  createOrder:
    handler: functions/orders/create.handler
    events:
      - httpApi:
          path: /orders
          method: POST
  getOrder:
    handler: functions/orders/get.handler
    events:
      - httpApi:
          path: /orders/{id}
          method: GET

Verify: curl -X POST https://your-api.execute-api.region.amazonaws.com/orders -> returns 201 Created

3. Implement stateless function handlers

Write each function as a pure request-response handler. Move initialization code (DB connections, SDK clients) outside the handler for reuse across warm invocations. [src3]

// functions/orders/create.js
const { DynamoDBClient } = require("@aws-sdk/client-dynamodb");
const { PutCommand, DynamoDBDocumentClient } = require("@aws-sdk/lib-dynamodb");

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

exports.handler = async (event) => {
  const body = JSON.parse(event.body);
  const orderId = crypto.randomUUID();

  await docClient.send(new PutCommand({
    TableName: process.env.ORDERS_TABLE,
    Item: { id: orderId, ...body, createdAt: new Date().toISOString() }
  }));

  return { statusCode: 201, body: JSON.stringify({ id: orderId }) };
};

Verify: aws lambda invoke --function-name createOrder --payload '{"body":"{\"item\":\"test\"}"}' out.json -> {"statusCode": 201, ...}

4. Add async event processing

Decouple synchronous request paths from heavy processing. Use queues or event buses to trigger background functions. [src4]

// Publish event after DynamoDB write
const { EventBridgeClient, PutEventsCommand } = require("@aws-sdk/client-eventbridge");
const eb = new EventBridgeClient({});

await eb.send(new PutEventsCommand({
  Entries: [{
    Source: "orders.service",
    DetailType: "OrderCreated",
    Detail: JSON.stringify({ orderId, ...body }),
    EventBusName: process.env.EVENT_BUS
  }]
}));

Verify: aws events describe-rule --name OrderCreatedRule -> rule exists and is enabled

5. Set up orchestration for multi-step workflows

Use Step Functions or Durable Functions for workflows that span multiple services and need retry logic, compensation, or parallel branches. [src3]

{
  "Comment": "Order processing workflow",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT:function:validateOrder",
      "Next": "ProcessPayment",
      "Retry": [{ "ErrorEquals": ["States.TaskFailed"], "MaxAttempts": 2 }]
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT:function:processPayment",
      "Next": "SendConfirmation",
      "Catch": [{ "ErrorEquals": ["PaymentFailed"], "Next": "CancelOrder" }]
    },
    "SendConfirmation": { "Type": "Task", "Resource": "...", "End": true },
    "CancelOrder": { "Type": "Task", "Resource": "...", "End": true }
  }
}

Verify: aws stepfunctions start-execution --state-machine-arn arn:aws:states:REGION:ACCOUNT:stateMachine:OrderProcessing -> execution succeeds

6. Configure observability and structured logging

Add structured JSON logging, distributed tracing, and custom metrics. Use environment variables for log level control. [src1]

// shared/logger.js
const log = (level, message, data = {}) => {
  console.log(JSON.stringify({
    level, message,
    timestamp: new Date().toISOString(),
    requestId: data.requestId || "unknown",
    ...data
  }));
};

module.exports = {
  info: (msg, data) => log("INFO", msg, data),
  error: (msg, data) => log("ERROR", msg, data),
  warn: (msg, data) => log("WARN", msg, data),
};

Verify: Check CloudWatch Logs -> log entries appear as structured JSON with requestId field

Code Examples

JavaScript: AWS Lambda HTTP Handler

// Input:  API Gateway v2 HTTP event
// Output: JSON response with status code

exports.handler = async (event) => {
  try {
    const { httpMethod, pathParameters, body } = event;
    const id = pathParameters?.id;
    const result = await processRequest(httpMethod, id, body);
    return { statusCode: 200, body: JSON.stringify(result) };
  } catch (err) {
    console.error(JSON.stringify({ error: err.message, stack: err.stack }));
    return { statusCode: err.statusCode || 500, body: JSON.stringify({ error: err.message }) };
  }
};

JavaScript: Cloudflare Worker with Router

// Input:  HTTP Request at edge (330+ global locations)
// Output: JSON response with near-zero cold start

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const path = url.pathname;

    if (path === "/api/orders" && request.method === "POST") {
      const body = await request.json();
      const id = crypto.randomUUID();
      await env.ORDERS_KV.put(id, JSON.stringify({ id, ...body }));
      return Response.json({ id }, { status: 201 });
    }

    if (path.startsWith("/api/orders/") && request.method === "GET") {
      const id = path.split("/").pop();
      const data = await env.ORDERS_KV.get(id, "json");
      if (!data) return Response.json({ error: "Not found" }, { status: 404 });
      return Response.json(data);
    }

    return Response.json({ error: "Not found" }, { status: 404 });
  }
};

Python: Google Cloud Function

# Input:  HTTP request via Cloud Functions 2nd gen
# Output: JSON response

import functions_framework
import json
from google.cloud import firestore

db = firestore.Client()  # Initialized outside handler for reuse

@functions_framework.http
def handle_order(request):
    if request.method == "POST":
        data = request.get_json(silent=True) or {}
        doc_ref = db.collection("orders").document()
        doc_ref.set({**data, "created_at": firestore.SERVER_TIMESTAMP})
        return json.dumps({"id": doc_ref.id}), 201
    return json.dumps({"error": "Method not allowed"}), 405

Anti-Patterns

Wrong: Monolithic Lambda with all routes in one function

// BAD -- single function handles everything, bloated package, broad IAM permissions
exports.handler = async (event) => {
  if (event.path === "/orders" && event.method === "POST") { /* ... */ }
  if (event.path === "/orders" && event.method === "GET") { /* ... */ }
  if (event.path === "/payments" && event.method === "POST") { /* ... */ }
  if (event.path === "/users" && event.method === "GET") { /* ... */ }
  // 50 more routes...
};

Correct: One function per domain action

// GOOD -- each function has minimal dependencies and least-privilege IAM
// functions/orders/create.js -- only needs dynamodb:PutItem on orders table
exports.handler = async (event) => {
  const body = JSON.parse(event.body);
  await docClient.send(new PutCommand({ TableName: "orders", Item: body }));
  return { statusCode: 201, body: JSON.stringify({ id: body.id }) };
};

Wrong: Storing state in function memory

// BAD -- state lost on cold start or new instance
let requestCount = 0;
let cachedUser = null;

exports.handler = async (event) => {
  requestCount++;  // Unreliable -- resets on cold start
  if (!cachedUser) cachedUser = await fetchUser(event.userId);
  return { statusCode: 200, body: JSON.stringify({ count: requestCount }) };
};

Correct: External state management

// GOOD -- state in DynamoDB, cache in ElastiCache/KV
exports.handler = async (event) => {
  const result = await docClient.send(new UpdateCommand({
    TableName: "counters", Key: { id: "requests" },
    UpdateExpression: "ADD #c :inc",
    ExpressionAttributeNames: { "#c": "count" },
    ExpressionAttributeValues: { ":inc": 1 },
    ReturnValues: "UPDATED_NEW"
  }));
  return { statusCode: 200, body: JSON.stringify({ count: result.Attributes.count }) };
};

Wrong: Synchronous chain of function calls

// BAD -- function A directly invokes function B, which invokes C
// Creates tight coupling, cascading failures, and double billing
exports.handler = async (event) => {
  const orderResult = await lambda.invoke({ FunctionName: "processOrder", Payload: event.body }).promise();
  const paymentResult = await lambda.invoke({ FunctionName: "processPayment", Payload: orderResult.Payload }).promise();
  const emailResult = await lambda.invoke({ FunctionName: "sendEmail", Payload: paymentResult.Payload }).promise();
  return emailResult;
};

Correct: Event-driven decoupling or orchestration

// GOOD -- publish event, let downstream consumers react independently
exports.handler = async (event) => {
  const order = await createOrder(event.body);
  await eventBridge.send(new PutEventsCommand({
    Entries: [{ Source: "orders", DetailType: "OrderCreated", Detail: JSON.stringify(order) }]
  }));
  return { statusCode: 202, body: JSON.stringify({ id: order.id, status: "processing" }) };
};
// Payment and email functions subscribe to "OrderCreated" events independently

Common Pitfalls

Cold start latency in Java/C# runtimes: JVM and .NET CLR initialization adds 1-10s on cold start. Fix: Use Lambda SnapStart for Java, Native AOT for .NET, or switch to Node.js/Python for latency-sensitive paths. [src1]
Uncontrolled concurrency causing downstream overload: Lambda can scale to thousands of concurrent instances, overwhelming databases with connection limits. Fix: Use ReservedConcurrency or connection pooling (RDS Proxy, PgBouncer). [src3]
Over-provisioning memory: Lambda allocates CPU proportionally to memory. 10GB functions cost 20x more than 512MB per ms. Fix: Use AWS Lambda Power Tuning to find the cost-optimal memory setting. [src1]
Vendor lock-in through deep SDK coupling: Building directly against provider-specific APIs makes migration expensive. Fix: Use a repository pattern or ports-and-adapters architecture to abstract storage. [src7]
Missing timeout and retry configuration: Default Lambda timeout is 3 seconds; default retry is 2 for async invocations. Fix: Set explicit timeout (slightly less than API Gateway's 29s max) and configure DLQ for failed async events. [src4]
Ignoring INIT phase costs (post-August 2025): AWS now bills for Lambda initialization time. Heavy imports directly increase cost. Fix: Lazy-load heavy dependencies, use Lambda Layers, minimize top-level imports. [src1]
Not structuring logs for observability: console.log("error happened") is useless at scale. Fix: Use structured JSON logging with request ID, function name, and correlation IDs for distributed tracing. [src3]
Treating serverless as cheaper VMs: Serverless requires fundamentally different design -- event-driven, stateless, fine-grained. Lifting and shifting a monolith creates worse outcomes. Fix: Redesign around events and bounded contexts before migrating. [src4]

Diagnostic Commands

# Check Lambda function configuration
aws lambda get-function-configuration --function-name myFunction

# View recent invocation logs (last 5 minutes)
aws logs filter-log-events --log-group-name /aws/lambda/myFunction --start-time $(date -d '5 minutes ago' +%s000)

# Check concurrent executions
aws lambda get-account-settings | jq '.AccountLimit.ConcurrentExecutions'

# Test Cloudflare Worker locally
npx wrangler dev src/index.js

# Deploy Cloudflare Worker
npx wrangler deploy

# Check Google Cloud Function logs
gcloud functions logs read myFunction --limit 50

# Validate Serverless Framework configuration
npx serverless print

Version History & Compatibility

Platform	Current Generation	Previous	Key Changes
AWS Lambda	Runtime API v2 (2024+)	v1	SnapStart GA, INIT billing (Aug 2025), 10GB memory, 6 vCPUs
Cloudflare Workers	V8 Isolates (2024+)	Service Workers API	Durable Objects, Queues, D1, Python support (beta)
Google Cloud Functions	2nd gen (CloudRun-based, 2023+)	1st gen	60min timeout, Eventarc triggers, concurrency per instance
Azure Functions	v4 (2023+)	v3 (EOL 2026-03-14)	Flex Consumption plan, .NET 8 isolated model
Vercel Functions	Edge Runtime (2023+)	Serverless Functions	Edge middleware, streaming responses, ISR

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Traffic is spiky or unpredictable (0 to 10K+ RPS)	Traffic is steady and predictable (always-on)	Containers on reserved instances (cheaper)
Individual requests complete in <15 minutes	Jobs run for hours (ML training, video transcoding)	ECS Fargate, Cloud Run jobs, or dedicated compute
You want zero infrastructure management	You need fine-grained OS/runtime control	Containers (Docker on ECS/GKE/AKS)
Event-driven processing (S3 uploads, DB changes, webhooks)	Requires persistent connections (WebSockets, gRPC streaming)	Containers or Cloudflare Durable Objects
Startup/small team with limited DevOps capacity	Strict latency SLA (<10ms p99) with cold start intolerance	Pre-warmed containers or dedicated instances
Cost optimization for low-traffic APIs (pay per invocation)	High-throughput, CPU-bound processing at scale	Reserved EC2/GCE instances (cost-effective at scale)

Important Caveats

AWS Lambda INIT phase billing (August 2025) changed the cost model significantly for functions with heavy initialization -- benchmark before and after
Cloudflare Workers run on V8 isolates (not containers), which means no native binary execution, no file system access, and a different security model than Lambda
Google Cloud Functions 2nd gen runs on Cloud Run under the hood, inheriting Cloud Run's concurrency model (multiple requests per instance) unlike Lambda's 1:1 model
Azure Functions v3 reaches end-of-life on March 14, 2026 -- migrate to v4 immediately
Serverless Framework, SST, and Pulumi abstract provider differences but add their own complexity and potential lock-in
Cold start benchmarks vary significantly by region, runtime, memory allocation, and VPC configuration -- always benchmark in your specific deployment environment
Multi-cloud serverless is possible in theory but costly in practice; pick one primary provider and use abstraction layers only where migration risk is real