GraphQL API Architecture at Scale

Type: Software Reference Confidence: 0.92 Sources: 7 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

ComponentRoleTechnology OptionsScaling Strategy
Gateway / RouterSchema composition, query planning, routing to subgraphsApollo Router (Rust), GraphQL Mesh, Cosmo RouterHorizontal -- stateless, deploy behind LB; cache query plans
Subgraph ServicesDomain-owned partial schema + resolversApollo Server, Netflix DGS, gqlgen (Go), Strawberry (Python)Horizontal -- independent scaling per domain team
Schema RegistryVersion control, composition validation, breaking change detectionApollo GraphOS, Hive (open-source), CosmoCentral -- single registry, CI/CD integration
DataLoader / BatchingBatch + cache data fetches within a single requestgraphql/dataloader (JS), Spring BatchLoader (DGS), dataloaden (Go)Per-request instance -- no cross-request caching
Query Complexity AnalyzerReject queries exceeding cost threshold before executiongraphql-query-complexity, Apollo cost analysis pluginConfigured at gateway -- cost limits per client tier
Persisted Query StoreMap query hashes to approved operationsRedis, CDN edge, Apollo APQ, Relay CompilerCache at edge -- hash lookup is O(1)
Caching LayerResponse + entity caching to reduce resolver executionCDN (Cloudflare, Fastly), Redis, Apollo cache hintsCache-Control headers + entity-level cache invalidation
ObservabilityDistributed tracing across gateway + subgraphsOpenTelemetry, Apollo Studio, Datadog, JaegerTrace context propagation via HTTP headers
Rate LimiterPer-client query budget based on complexity costApollo Router plugins, Cloudflare WAF, custom middlewareToken bucket per API key; cost-based budgets
Auth / AuthZAuthentication at gateway, authorization at resolver levelJWT validation at gateway, directive-based @auth in subgraphsGateway validates tokens; subgraphs enforce field-level access

Decision Tree

START
├── Single team, <5 entity types, <1K QPS?
│   ├── YES → Monolith GraphQL server (Apollo Server / graphql-yoga / gqlgen)
│   └── NO ↓
├── 2-5 teams, shared schema ownership?
│   ├── YES → Schema stitching with GraphQL Mesh or modular monolith with schema modules
│   └── NO ↓
├── 6+ teams, each owns a domain (users, products, orders)?
│   ├── YES → Apollo Federation v2 with subgraph-per-team
│   └── NO ↓
├── Java/Kotlin ecosystem, Spring Boot stack?
│   ├── YES → Netflix DGS Framework with Federation support
│   └── NO ↓
├── >50K QPS, need edge caching and query plan optimization?
│   ├── YES → Apollo Router (Rust) + persisted queries + CDN caching + entity cache
│   └── NO ↓
├── Need to combine GraphQL + REST + gRPC backends?
│   ├── YES → GraphQL Mesh as a unifying gateway layer
│   └── NO ↓
└── DEFAULT → Start with monolith GraphQL server, extract to federation when team count exceeds 3

Step-by-Step Guide

1. Define your supergraph schema with domain boundaries

Map your domain into bounded contexts. Each domain team owns a subgraph with its core types. Use the @key directive to declare entity identity, allowing other subgraphs to extend types across boundaries. [src1]

# products subgraph -- owns Product type
type Product @key(fields: "id") {
  id: ID!
  name: String!
  price: Float!
  category: Category!
}

type Category {
  id: ID!
  name: String!
}

type Query {
  product(id: ID!): Product
  products(first: Int = 10, after: String): ProductConnection!
}

Verify: rover subgraph check <graph>@<variant> --schema products.graphql --name products → composition succeeds with no errors.

2. Implement entity references across subgraphs

When one subgraph needs data owned by another, use stub types with @key to declare a reference. The gateway resolves the full entity by calling the owning subgraph's __resolveReference function. [src1]

# reviews subgraph -- references Product from products subgraph
type Product @key(fields: "id") {
  id: ID!
  reviews: [Review!]!
  averageRating: Float
}

type Review {
  id: ID!
  author: User!
  body: String!
  rating: Int!
  createdAt: DateTime!
}
// reviews subgraph -- resolve Product references
const resolvers = {
  Product: {
    __resolveReference(product) {
      return { id: product.id };
    },
    reviews(product) {
      return reviewsLoader.load(product.id);
    },
    averageRating(product) {
      return ratingsLoader.load(product.id);
    },
  },
};

Verify: Query through gateway: { product(id: "1") { name reviews { body rating } } } → returns product name from products subgraph and reviews from reviews subgraph in a single response.

3. Set up the federation gateway with Apollo Router

Deploy Apollo Router as the single client-facing endpoint. It composes subgraph schemas into a supergraph, builds query plans, and routes operations to the appropriate subgraphs. [src1]

# supergraph.yaml -- Apollo Router configuration
supergraph:
  listen: 0.0.0.0:4000
  introspection: false  # disabled in production
subgraphs:
  products:
    routing_url: http://products-service:4001/graphql
  reviews:
    routing_url: http://reviews-service:4002/graphql
  users:
    routing_url: http://users-service:4003/graphql
# Compose and validate the supergraph
rover supergraph compose --config supergraph.yaml > supergraph.graphql

# Start the router
./router --supergraph supergraph.graphql --config router.yaml

Verify: curl -X POST http://localhost:4000/ -H "Content-Type: application/json" -d '{"query":"{ __typename }"}' → returns {"data":{"__typename":"Query"}}.

4. Implement DataLoader for N+1 prevention

Create request-scoped DataLoader instances for every data source a resolver calls. DataLoader batches all .load(key) calls within a single tick into one batch function call. [src2]

// dataloaders.js -- Request-scoped DataLoader factory
const DataLoader = require("dataloader");  // [email protected]
const db = require("./db");

function createLoaders() {
  return {
    productLoader: new DataLoader(async (ids) => {
      const products = await db.query(
        "SELECT * FROM products WHERE id = ANY($1)", [ids]
      );
      const map = new Map(products.map(p => [p.id, p]));
      return ids.map(id => map.get(id) || null);
    }),
  };
}

Verify: Enable query logging on database → a query for 10 products with reviews produces exactly 2 SQL queries, not 11.

5. Add query complexity analysis and depth limiting

Configure the gateway to reject queries that exceed a maximum depth or complexity cost before execution. Assign cost weights to fields based on their resolver expense. [src4]

// Apollo Server with query complexity plugin
const { createComplexityLimitRule } = require("graphql-validation-complexity");

const server = new ApolloServer({
  typeDefs,
  resolvers,
  validationRules: [
    createComplexityLimitRule(1000, {
      scalarCost: 1,
      objectCost: 2,
      listFactor: 10,
    }),
  ],
});

Verify: Send a deeply nested query (depth 15) → receive error. Send a wide query with many list fields → receive cost exceeded error.

6. Implement persisted queries and client allowlisting

Use Automatic Persisted Queries (APQ) for public clients or a compiled operation allowlist for trusted clients. [src7]

const server = new ApolloServer({
  typeDefs,
  resolvers,
  persistedQueries: {
    cache: new KeyvAdapter(new Keyv("redis://redis:6379")),
    ttl: 86400,  // 24 hours
  },
});
// Client sends hash first: { "extensions": { "persistedQuery": { "sha256Hash": "abc..." } } }

Verify: Send a query with only its SHA-256 hash → first returns PersistedQueryNotFound, second with full query registers it, third with hash returns cached result.

Code Examples

TypeScript: Apollo Federation Subgraph with Authentication

// products-subgraph/index.ts -- Complete federated subgraph
// Input:  GraphQL queries routed from Apollo Router
// Output: Product data with field-level authorization

import { ApolloServer } from "@apollo/server";  // @apollo/[email protected]
import { buildSubgraphSchema } from "@apollo/subgraph";  // @apollo/[email protected]
import { gql } from "graphql-tag";

const typeDefs = gql`
  extend schema @link(url: "https://specs.apollo.dev/federation/v2.5",
    import: ["@key", "@shareable", "@requires", "@external"])

  type Product @key(fields: "id") {
    id: ID!
    name: String!
    price: Float!
    internalCost: Float @auth(requires: ADMIN)
  }

  type Query {
    product(id: ID!): Product
    products(first: Int = 20, after: String): ProductConnection!
  }
`;

const server = new ApolloServer({
  schema: buildSubgraphSchema({ typeDefs, resolvers }),
});

Go: gqlgen Subgraph with DataLoader

// resolvers/product.go -- gqlgen resolver with dataloaden
// Input:  GraphQL product queries
// Output: Batched database responses

package resolvers

import (
    "context"
    "github.com/graph-gophers/dataloader/v7"
)

func (r *queryResolver) Product(ctx context.Context, id string) (*Product, error) {
    thunk := r.Loaders.ProductLoader.Load(ctx, dataloader.StringKey(id))
    result, err := thunk()
    if err != nil {
        return nil, err
    }
    return result.(*Product), nil
}

Anti-Patterns

Wrong: Exposing database structure as GraphQL schema

// BAD -- Schema mirrors database tables, not business domain
type products_table {
  product_id: Int!
  product_name: String
  category_fk: Int
  created_at: String
  is_deleted: Boolean
}
// Leaks implementation details, exposes internal IDs.

Correct: Design schema around business domain

// GOOD -- Schema represents business concepts [src3]
type Product @key(fields: "id") {
  id: ID!
  name: String!
  price: Money!
  category: Category!
  availability: Availability!
}
// Clean business types, strong typing, no leaked internals.

Wrong: Resolvers without DataLoader (N+1 problem)

// BAD -- Each product triggers a separate DB query
const resolvers = {
  Product: {
    reviews(product) {
      return db.query("SELECT * FROM reviews WHERE product_id = $1", [product.id]);
    },
  },
};
// 100 products = 101 database queries.

Correct: Batched resolvers with DataLoader

// GOOD -- DataLoader batches all review fetches into 1 query [src2]
const resolvers = {
  Product: {
    reviews(product, _, { loaders }) {
      return loaders.reviewsByProductLoader.load(product.id);
      // 100 products = 2 database queries total.
    },
  },
};

Wrong: No query depth or complexity limits

// BAD -- Accepts any query, no matter how expensive
const server = new ApolloServer({ typeDefs, resolvers });
// Attacker sends deeply nested query -> server runs out of memory.

Correct: Enforce depth + complexity limits at gateway

// GOOD -- Reject expensive queries before execution [src4]
const server = new ApolloServer({
  typeDefs,
  resolvers,
  validationRules: [depthLimit(10), complexityLimit(1000)],
  introspection: false,
});

Wrong: Offset-based pagination

# BAD -- Offset pagination degrades at scale
type Query {
  products(limit: Int, offset: Int): [Product]
}
# offset: 100000 -> database scans and skips 100K rows.

Correct: Cursor-based (Relay-style) pagination

# GOOD -- Cursor pagination is stable and performant [src3]
type Query {
  products(first: Int!, after: String): ProductConnection!
}
type ProductConnection {
  edges: [ProductEdge!]!
  pageInfo: PageInfo!
}
# O(1) seek regardless of page position.

Common Pitfalls

Diagnostic Commands

# Validate supergraph composition
rover supergraph compose --config supergraph.yaml

# Check a subgraph against the deployed supergraph
rover subgraph check <graph>@<variant> --schema products.graphql --name products

# Introspect a running subgraph (development only)
rover subgraph introspect http://localhost:4001/graphql

# Test query execution through the gateway
curl -X POST http://localhost:4000/ \
  -H "Content-Type: application/json" \
  -d '{"query":"{ product(id: \"1\") { name price reviews { rating } } }"}'

# Monitor Apollo Router metrics (Prometheus)
curl http://localhost:9090/metrics | grep apollo_router

# Trace query execution plan in Apollo Router
APOLLO_ROUTER_LOG=apollo_router::query_planner=debug ./router --supergraph supergraph.graphql

Version History & Compatibility

VersionStatusBreaking ChangesMigration Notes
GraphQL Spec Sep 2025CurrentSchema Coordinates, OneOf inputsFirst spec update since Oct 2021
Apollo Federation v2.5+CurrentNone since v2.0Requires @link directive
Apollo Federation v2.0Stable@key syntax changed from v1Add @link import, replace @requires syntax
Apollo Federation v1Deprecated---Upgrade to v2 for @shareable, @override
Netflix DGS 9.xCurrent (Spring Boot 3.x)Requires Java 17+Aligns with Spring GraphQL
Netflix DGS 7.xMaintenance---Upgrade to 9.x for Spring Boot 3
GraphQL Mesh v1.xCurrent---Replaces 0.x with stable API

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Multiple teams need to contribute to a unified API independentlySingle team owns the entire API surfaceMonolith GraphQL server (Apollo Server, graphql-yoga)
Clients need flexible, nested data fetching in a single requestSimple CRUD with flat resources and no nestingREST API with OpenAPI spec
Mobile + web clients have very different data needs for the same screenAll clients need identical data shapesREST with versioned endpoints
Need to compose multiple backend services into one APISingle backend database with no microservicesDirect GraphQL-to-database (Hasura, PostGraphile)
Schema evolution without breaking clients is criticalStrict API versioning is acceptableREST with content negotiation
Need real-time subscriptions alongside queriesOnly request-response needed, no real-timeREST API or gRPC for service-to-service

Important Caveats

Related Units