Fix N+1 Queries in GraphQL with DataLoader
How do I fix N+1 queries in GraphQL with DataLoader?
TL;DR
- Bottom line: Wrap every resolver that fetches related data in a DataLoader instance — it automatically batches individual
.load(id)calls within one execution tick into a single bulk query, turning N+1 queries into 2. - Key tool/command:
new DataLoader(keys => batchFetchByIds(keys))(JS) orDataLoader(load_fn=batch_load)(Python Strawberry) - Watch out for: Creating a single global DataLoader instance instead of one per request — this leaks cached data between users.
- Works with: Any GraphQL server (Apollo Server 4, graphql-js, Strawberry, Graphene, gqlgen, graphql-java). DataLoader v2.x requires Node.js >= 10; Strawberry DataLoader requires Python >= 3.8 with asyncio.
Constraints
- DataLoader instances MUST be created per-request (in the GraphQL context factory) — a shared global instance caches data across users, causing data leaks and stale reads
- The batch function MUST return results in the exact same order as the input keys array, with the same length — mismatches cause silent data corruption
- DataLoader requires an async execution environment — Node.js event loop (JS) or asyncio event loop (Python)
- Never disable batching (
batch: false) in production — it defeats the entire purpose of DataLoader - Error objects returned in the batch array are cached by default — clear the cache on transient failures or disable error caching
Quick Reference
| # | Cause | Likelihood | Signature | Fix |
|---|---|---|---|---|
| 1 | No DataLoader — each resolver calls DB individually | ~50% of cases | N separate SELECT ... WHERE id = ? in query logs | Wrap in DataLoader with WHERE id IN (?) batch query |
| 2 | DataLoader created globally, not per-request | ~15% of cases | Stale/wrong data returned; data leaks between users | Move DataLoader instantiation into context factory |
| 3 | Batch function returns results in wrong order | ~10% of cases | Data assigned to wrong parent objects | Map results to input keys: keys.map(k => resultMap.get(k)) |
| 4 | Batch function returns wrong array length | ~8% of cases | DataLoader must return array of same length error | Return null for missing keys; match array length to keys |
| 5 | Using .loadMany() where .load() suffices | ~5% of cases | Bypasses per-key deduplication within a tick | Use .load() in resolvers; .loadMany() only for explicit bulk |
| 6 | Nested DataLoaders not batching across depth levels | ~5% of cases | Batch size = 1 for deeply nested fields | Ensure async resolvers; check batchScheduleFn timing |
| 7 | Complex key objects not deduplicating | ~4% of cases | Duplicate DB queries despite same logical key | Provide cacheKeyFn that serializes keys to strings |
| 8 | DataLoader cache hiding database updates | ~3% of cases | Mutations not reflected in subsequent queries | Call loader.clear(key) or loader.clearAll() after mutations |
Decision Tree
START
|-- Is your GraphQL server in JavaScript/TypeScript?
| |-- YES --> Use `dataloader` npm package. See "JS/TS: Apollo Server" code example.
| +-- NO |
|-- Is your server in Python with Strawberry?
| |-- YES --> Use `strawberry.dataloader.DataLoader`. See "Python: Strawberry" code example.
| +-- NO |
|-- Is your server in Python with Graphene?
| |-- YES --> Use `aiodataloader` package. See "Python: Graphene" code example.
| +-- NO |
|-- Is your server in Go (gqlgen)?
| |-- YES --> Use `graph-gophers/dataloader` or gqlgen built-in dataloaden.
| +-- NO |
|-- Is your server using Apollo Federation?
| |-- YES --> Apply DataLoader in each subgraph's reference resolver.
| +-- NO |
+-- DEFAULT --> Implement batch-and-cache pattern manually:
collect keys during resolve phase, batch-fetch before returning.
Step-by-Step Guide
1. Identify the N+1 problem in your query logs
Enable query logging in your database or GraphQL server to see repeated queries. Look for patterns where the same SELECT statement runs N times with different IDs. [src1]
-- You'll see this pattern in logs (N+1):
SELECT * FROM posts; -- 1 query
SELECT * FROM users WHERE id = 1; -- N queries (one per post)
SELECT * FROM users WHERE id = 2;
SELECT * FROM users WHERE id = 3;
-- ... repeated for every post.authorId
Verify: Count queries per GraphQL request in your DB logs. If count > (unique tables queried * 2), you likely have N+1.
2. Install DataLoader
Install the appropriate DataLoader package for your language. [src2]
# JavaScript/TypeScript
npm install dataloader
# Python (Strawberry — built-in, no extra install)
pip install strawberry-graphql
# Python (Graphene)
pip install aiodataloader
Verify: npm list dataloader → [email protected] (or later)
3. Create a batch loading function
The batch function receives an array of keys and must return a Promise/awaitable resolving to an array of results in the same order as the keys. [src2]
// JavaScript — batch loading function
async function batchUsers(userIds) {
// One query fetches ALL requested users
const users = await db.query(
'SELECT * FROM users WHERE id = ANY($1)',
[userIds]
);
// CRITICAL: return results in same order as input keys
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id) || null);
}
Verify: batchUsers([3, 1, 2]) returns [user3, user1, user2] — same order as input.
4. Create DataLoader instances per-request in context
Attach fresh DataLoader instances to the GraphQL context so every resolver in a single request shares the same loader (batching + dedup) but different requests get isolated caches. [src1] [src3]
// Apollo Server 4 — context factory
const server = new ApolloServer({ typeDefs, resolvers });
const { url } = await startStandaloneServer(server, {
context: async () => ({
loaders: {
user: new DataLoader(batchUsers),
post: new DataLoader(batchPosts),
},
}),
});
Verify: Add console.log('context created') in the factory — it should log once per HTTP request.
5. Use DataLoader in resolvers
Replace direct database calls in resolvers with loader.load(key). [src1]
const resolvers = {
Query: {
posts: () => db.query('SELECT * FROM posts'),
},
Post: {
// BEFORE (N+1): db.query('SELECT * FROM users WHERE id = $1', [post.authorId])
// AFTER (batched):
author: (post, _args, { loaders }) => loaders.user.load(post.authorId),
},
};
Verify: Run the query and check DB logs — you should see exactly 1 SELECT * FROM posts + 1 SELECT * FROM users WHERE id = ANY(...) instead of N+1 queries.
6. Clear cache after mutations
After any create/update/delete operation, clear the relevant DataLoader cache to prevent stale reads within the same request. [src2]
const resolvers = {
Mutation: {
updateUser: async (_, { id, input }, { loaders }) => {
const updated = await db.query(
'UPDATE users SET name = $1 WHERE id = $2 RETURNING *',
[input.name, id]
);
// Clear stale cache entry and prime with fresh data
loaders.user.clear(id);
loaders.user.prime(id, updated[0]);
return updated[0];
},
},
};
Verify: After mutation, subsequent resolvers in the same request return updated data.
Code Examples
JavaScript/TypeScript: Apollo Server 4 with DataLoader
// Input: GraphQL query { posts { id title author { id name } } }
// Output: 2 DB queries total (1 for posts, 1 for all authors) instead of N+1
import { ApolloServer } from '@apollo/server';
import { startStandaloneServer } from '@apollo/server/standalone';
import DataLoader from 'dataloader';
const batchUsers = async (ids: readonly number[]) => {
const users = await db.query('SELECT * FROM users WHERE id = ANY($1)', [ids]);
const map = new Map(users.rows.map((u: any) => [u.id, u]));
return ids.map(id => map.get(id) || null);
};
const resolvers = {
Post: {
author: (post: any, _: any, ctx: any) => ctx.loaders.user.load(post.author_id),
},
};
const server = new ApolloServer({ typeDefs, resolvers });
const { url } = await startStandaloneServer(server, {
context: async () => ({
loaders: { user: new DataLoader(batchUsers) },
}),
});
Python: Strawberry GraphQL with DataLoader
# Input: GraphQL query { posts { id title author { id name } } }
# Output: 2 DB queries total instead of N+1
import strawberry
from strawberry.dataloader import DataLoader
from typing import List, Optional
async def load_users_batch(keys: List[int]) -> List[Optional['User']]:
rows = await db.fetch('SELECT * FROM users WHERE id = ANY($1)', keys)
user_map = {row['id']: User(id=row['id'], name=row['name']) for row in rows}
return [user_map.get(key) for key in keys]
@strawberry.type
class Post:
id: int
title: str
author_id: int
@strawberry.field
async def author(self, info: strawberry.Info) -> Optional[User]:
return await info.context["user_loader"].load(self.author_id)
async def get_context():
return {"user_loader": DataLoader(load_fn=load_users_batch)}
Python: Graphene with aiodataloader
# Input: GraphQL query { posts { id title author { id name } } }
# Output: 2 DB queries total instead of N+1
from aiodataloader import DataLoader
import graphene
class UserLoader(DataLoader):
async def batch_load_fn(self, user_ids):
users = {u.id: u for u in await User.objects.filter(id__in=user_ids)}
return [users.get(uid) for uid in user_ids]
class PostType(graphene.ObjectType):
id = graphene.Int()
title = graphene.String()
author = graphene.Field(lambda: UserType)
async def resolve_author(self, info):
return await info.context['user_loader'].load(self.author_id)
Anti-Patterns
Wrong: Global singleton DataLoader (data leak between users)
// BAD — shared across all requests, leaks user A's data to user B
const globalUserLoader = new DataLoader(batchUsers);
const resolvers = {
Post: {
author: (post) => globalUserLoader.load(post.authorId),
},
};
Correct: Per-request DataLoader in context
// GOOD — fresh instance per request, isolated cache
const server = new ApolloServer({ typeDefs, resolvers });
startStandaloneServer(server, {
context: async () => ({
loaders: { user: new DataLoader(batchUsers) },
}),
});
Wrong: Batch function returns results in arbitrary order
// BAD — results don't match key order, causes data corruption
async function batchUsers(ids) {
const users = await db.query('SELECT * FROM users WHERE id = ANY($1)', [ids]);
return users.rows; // Order depends on DB, NOT on input key order!
}
Correct: Map results back to input key order
// GOOD — explicitly map to input key order
async function batchUsers(ids) {
const users = await db.query('SELECT * FROM users WHERE id = ANY($1)', [ids]);
const map = new Map(users.rows.map(u => [u.id, u]));
return ids.map(id => map.get(id) || null);
}
Wrong: Calling the database directly inside every resolver
// BAD — each post triggers a separate DB query (N+1)
const resolvers = {
Post: {
author: async (post) => {
return await db.query('SELECT * FROM users WHERE id = $1', [post.authorId]);
},
},
};
Correct: Delegate to DataLoader
// GOOD — DataLoader batches all author lookups into one query
const resolvers = {
Post: {
author: (post, _, { loaders }) => loaders.user.load(post.authorId),
},
};
Wrong: Using loadMany in individual resolvers instead of load
// BAD — loadMany for a single key is wasteful and bypasses per-key dedup
const resolvers = {
Post: {
author: (post, _, { loaders }) =>
loaders.user.loadMany([post.authorId]).then(r => r[0]),
},
};
Correct: Use load() for single keys, loadMany() for explicit bulk
// GOOD — load() for resolvers, loadMany() only for explicit batch
const resolvers = {
Query: {
users: (_, { ids }, { loaders }) => loaders.user.loadMany(ids),
},
Post: {
author: (post, _, { loaders }) => loaders.user.load(post.authorId),
},
};
Common Pitfalls
- Batch function returns wrong length: DataLoader throws
DataLoader must be constructed with a function which accepts Array<key> and returns Promise<Array<value>>error. Fix: ensure you returnnullfor missing keys —keys.map(k => map.get(k) || null). [src2] - Cache persists across requests: Users see other users' data because DataLoader instance lives beyond a single request. Fix: always create DataLoader in the context factory function, never at module scope. [src1]
- Errors are cached by default: A transient DB error gets cached, and all subsequent
.load()calls for that key return the cached error. Fix: callloader.clear(key)in your error handler, or usenew DataLoader(fn, { cache: false }). [src2] - Object keys don't deduplicate: Two
{id: 1}objects are different references, so DataLoader treats them as different keys. Fix: providecacheKeyFn: (key) => key.idor serialize to string. [src2] - Async/await breaks batching window: If you
awaitbefore calling.load(), the batching tick may have already fired. Fix: collect all.load()calls synchronously within a resolver, or usePromise.all(). [src1] - Federation subgraphs missing DataLoader: Each subgraph's
__resolveReferenceruns once per entity key, recreating N+1 at the subgraph boundary. Fix: use DataLoader in__resolveReferenceresolvers too. [src3] - No cache clearing after mutations: Reads after a mutation return stale cached data. Fix: call
loader.clear(key)after writes, optionallyloader.prime(key, newValue). [src2] - maxBatchSize not set for large lists: Batch functions may receive thousands of keys, causing
IN (...)clauses that exceed DB limits. Fix: setmaxBatchSize: 100. [src2]
Diagnostic Commands
# Check if DataLoader is installed (Node.js)
npm list dataloader
# Expected: [email protected]
# Check for N+1 in PostgreSQL logs (enable statement logging first)
# In postgresql.conf: log_statement = 'all'
grep 'SELECT.*FROM users WHERE id =' /var/log/postgresql/postgresql.log | wc -l
# If count >> number of unique IDs, you have N+1
# Enable Apollo Server query logging to detect N+1
DEBUG=knex:query node server.js
# Check DataLoader batch sizes at runtime (add to batch function)
# async function batchUsers(ids) {
# console.log(`DataLoader batch size: ${ids.length}`);
# }
# Python: check aiodataloader version
pip show aiodataloader
# Expected: aiodataloader >= 0.2.0
Version History & Compatibility
| Version | Status | Breaking Changes | Migration Notes |
|---|---|---|---|
| dataloader 2.2.x (JS) | Current (2024) | None | Stable since 2.0.0 |
| dataloader 2.0.0 (JS) | LTS | Dropped Node < 10; TypeScript rewrite | Update Node.js; types now built-in |
| dataloader 1.x (JS) | EOL | — | API compatible; bump version |
| strawberry.dataloader (Python) | Current (2025) | None | Built into strawberry-graphql >= 0.100.0 |
| aiodataloader 0.2.x (Python) | Current (2024) | None | For Graphene; requires asyncio |
| graphql-batch (Ruby) | Current (2024) | None | Shopify's alternative; different API |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Nested GraphQL resolvers fetch related entities by ID | Single resolver fetches one record | Direct database query |
| Same entity is requested multiple times in one query | Data changes between resolver calls within one request | Disable cache or clear after mutation |
| Federated subgraph resolves entity references | You need cross-request persistent caching | Redis/Memcached layer |
| Any list-to-detail resolver pattern (posts → authors) | Query complexity is bounded and small (< 5 items) | Direct fetch is fine for small N |
| You want to decouple data fetching from schema structure | ORM already handles eager loading (e.g., Rails includes) | Use ORM's built-in batching |
Important Caveats
- DataLoader only caches within a single request — it is NOT a replacement for Redis/Memcached. For cross-request caching, layer a persistent cache underneath your batch function.
- The batch function's returned array MUST match the length and order of the input keys array. This is the most common source of bugs. Always build a map from results and reconstruct the array from keys.
- In serverless environments (AWS Lambda, Cloudflare Workers), DataLoader instances are naturally per-invocation, but be careful with warm starts that might reuse module-level variables.
- Disabling the cache (
cache: false) means duplicate keys within the same tick will appear multiple times in the batch function's keys array. - DataLoader uses
process.nextTick(Node.js) or microtask scheduling by default. CustombatchScheduleFncan change timing but may break batching if the window is too short.