How to Diagnose and Fix Docker OOMKilled Errors
How do I diagnose and fix Docker OOMKilled errors?
TL;DR
- Bottom line: OOMKilled means the Linux kernel's OOM killer terminated your container
because it exceeded its memory limit (cgroup-enforced) or the host ran out of memory. Exit code 137 =
128 + SIGKILL (signal 9). Check
docker inspect --format='{{.State.OOMKilled}}'— iftrue, the container was killed for memory. - Key tool/command:
docker statsshows live memory usage.docker inspect <cid> --format='{{.State.OOMKilled}}'confirms OOM.docker run --memory=512msets limits.dmesg | grep -i oomshows kernel OOM events. - Watch out for:
--memorywithout--memory-swapdefaults swap to 2× memory. Java's-Xmxmust be lower than--memoryto leave room for non-heap. On cgroups v2,--oom-kill-disableis silently discarded. - Works with: Docker Engine 20.10+ (cgroups v1/v2), Docker Desktop 4.x, Podman, Docker Compose, Kubernetes.
Constraints
- Never use
--oom-kill-disableon cgroups v2 hosts — the flag is silently discarded in Docker Engine 27+; the container will still be OOM-killed [src7] - JVM heap (
-Xmx) must be set to 70-80% of the container memory limit — setting it equal to--memoryguarantees OOMKilled because metaspace, threads, and GC buffers need the remaining 20-30% [src1, src4] - On Docker Desktop (Mac/Windows), total container memory cannot exceed the VM allocation (default ~50% of
host RAM) regardless of per-container
--memoryflags [src1] - Exit code 137 does not always mean OOM — always confirm with
docker inspect --format='{{.State.OOMKilled}}'before assuming a memory issue [src3, src4] - The
--kernel-memoryflag is discarded on cgroups v2 — do not rely on it for kernel memory limits on modern Linux hosts [src7] - On Kubernetes 1.28+ with cgroups v2, the cgroup-aware OOM killer terminates ALL processes in the cgroup, not just the one that exceeded the limit [src6]
- Docker Engine 29 (2026) formally deprecates cgroup v1 — support continues until May 2029, but plan host migration to cgroup v2 now [src10]
- Docker Engine 29 dropped the default
nofileulimit from 1,048,576 to 1,024 (containerd v2.1.5) — programs that adjust behavior based on this limit will allocate much less memory by default [src10]
Quick Reference
| # | Cause | Likelihood | Signature | Fix |
|---|---|---|---|---|
| 1 | Container memory limit too low | ~30% | OOMKilled: true; app uses expected memory |
Increase --memory limit [src1, src4] |
| 2 | Application memory leak | ~25% | Memory grows linearly; OOM after hours/days | Profile and fix the leak [src4, src8] |
| 3 | JVM heap exceeds container limit | ~15% | Java app; -Xmx ≥ --memory |
Set -Xmx to 70-80% of container memory [src1, src4] |
| 4 | Large file/data processing | ~8% | OOM during specific operations | Use streaming/chunked processing [src8] |
| 5 | Fork bomb / child process explosion | ~5% | Memory spikes instantly; many processes | Set --pids-limit; fix fork logic [src1] |
| 6 | Host itself is out of memory | ~5% | Multiple containers OOMKilled; dmesg shows OOM |
Add host RAM; reduce container count [src4, src6] |
| 7 | No memory limit + host exhaustion | ~4% | No --memory flag; host runs out |
Always set memory limits in production [src1] |
| 8 | Memory-mapped files / tmpfs | ~3% | Container uses tmpfs or mmap | Exclude tmpfs or increase limit [src1] |
| 9 | Swap disabled or misconfigured | ~3% | OOMs at exactly the --memory value |
Configure --memory-swap [src1] |
| 10 | Build-time OOM | ~2% | OOM during npm install, compilation |
Increase Docker Desktop memory [src1] |
Decision Tree
START — Container exits with code 137
├── Is OOMKilled true? (docker inspect --format='{{.State.OOMKilled}}')
│ ├── YES → Out of memory ↓
│ └── NO → Not OOM — killed by docker kill, docker stop timeout, or orchestrator
├── Was a memory limit set? (docker inspect --format='{{.HostConfig.Memory}}')
│ ├── YES → Container exceeded this limit
│ │ ├── Is the limit too low? → Increase --memory [src1]
│ │ ├── Gradual growth (leak)? → Profile the application [src4, src8]
│ │ └── Java/JVM? → Check -Xmx (leave 25-30% for non-heap) [src1]
│ └── NO → Host ran out of memory
│ ├── Check host: free -m, dmesg | grep oom [src6]
│ └── Set limits on all containers [src1]
├── cgroups version? (stat -fc %T /sys/fs/cgroup)
│ ├── cgroup2fs → v2: check memory.max, memory.events [src6]
│ └── tmpfs → v1: check memory.limit_in_bytes, memory.oom_control [src6]
├── Build-time OOM? → Increase Docker Desktop memory [src1]
└── Monitor with docker stats to find peak usage [src2]
├── Peak near limit? → Increase limit by 20-30%
└── Sudden spike? → Profile that code path
Step-by-Step Guide
1. Confirm the OOM kill
Not every exit code 137 is OOM — it can also be a manual docker kill. [src3, src4]
docker inspect <cid> --format='{{.State.OOMKilled}}'
docker inspect <cid> --format='{{json .State}}' | python -m json.tool
dmesg | grep -i "oom\|out of memory" | tail -20
# cgroups v2: check OOM event count
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.events
Verify: OOMKilled: true confirms the container was killed for exceeding memory.
2. Check current memory limits
Understand what limits are configured. [src1, src3]
docker inspect <cid> --format='Memory: {{.HostConfig.Memory}}, Swap: {{.HostConfig.MemorySwap}}'
docker inspect <cid> --format='{{.HostConfig.Memory}}' | awk '{print $1/1024/1024 "MB"}'
# Check cgroups version
stat -fc %T /sys/fs/cgroup # cgroup2fs = v2, tmpfs = v1
Verify: Know the exact limit (0 = unlimited).
3. Monitor live memory usage
Use docker stats to see real-time memory consumption. [src2]
docker stats
docker stats --no-stream
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.PIDs}}"
Verify: Watch memory over time — gradual growth = leak; spikes = load-dependent.
4. Set appropriate memory limits
Configure limits based on observed usage plus headroom. [src1]
docker run --memory=512m --memory-swap=1g myapp
docker run --memory=512m --memory-swap=512m myapp # No swap
docker run --memory=1g --memory-reservation=512m myapp # Soft limit
# Docker Compose
services:
app:
deploy:
resources:
limits:
memory: 512M
reservations:
memory: 256M
5. Configure JVM / Node.js / Python memory
Runtime-specific settings must align with container limits. [src1, src4]
# Java 10+ (container-aware)
docker run --memory=1g myapp java -XX:MaxRAMPercentage=75.0 -jar app.jar
# Node.js
docker run --memory=512m myapp node --max-old-space-size=384 app.js
6. Profile memory usage inside the container
Get detailed memory breakdown. [src4, src8]
docker exec <cid> ps aux --sort=-%mem | head -20
docker exec <cid> cat /proc/meminfo | head -10
docker exec <cid> cat /proc/1/status | grep -E "VmRSS|VmSize|VmPeak"
# cgroups v2: check memory pressure
docker exec <cid> cat /sys/fs/cgroup/memory.pressure 2>/dev/null
7. Set up memory alerts
Proactive monitoring before OOM kills. [src5]
#!/bin/bash
THRESHOLD=80
while true; do
docker stats --no-stream --format '{{.Name}} {{.MemPerc}}' | while read name pct; do
pct_num=$(echo "$pct" | tr -d '%')
if (( $(echo "$pct_num > $THRESHOLD" | bc -l) )); then
echo "WARNING $(date): $name at ${pct}% memory"
fi
done
sleep 30
done
Code Examples
Container memory diagnostics script
Full script: container-memory-diagnostics-script.sh (43 lines)
#!/bin/bash
# Input: Container ID or name
# Output: Complete memory diagnostic report
CID="$1"
if [ -z "$CID" ]; then echo "Usage: $0 <container_id>"; exit 1; fi
echo "=== Memory Diagnostic Report ==="
echo "Container: $CID"
echo "Time: $(date -u)"
STATE=$(docker inspect "$CID" --format='{{.State.Status}}' 2>/dev/null)
if [ -z "$STATE" ]; then echo "Container not found"; exit 1; fi
OOM=$(docker inspect "$CID" --format='{{.State.OOMKilled}}')
EXIT=$(docker inspect "$CID" --format='{{.State.ExitCode}}')
echo "Status: $STATE | Exit: $EXIT | OOMKilled: $OOM"
MEM=$(docker inspect "$CID" --format='{{.HostConfig.Memory}}')
echo "Memory limit: $(echo "$MEM" | awk '{if($1>0) printf "%.0fMB\n",$1/1024/1024; else print "unlimited"}')"
if [ "$STATE" = "running" ]; then
echo "=== Current Usage ==="
docker stats --no-stream --format "Memory: {{.MemUsage}} ({{.MemPerc}})" "$CID"
echo "=== Top Processes ==="
docker exec "$CID" ps aux --sort=-%mem 2>/dev/null | head -6
fi
echo "=== Last 10 Log Lines ==="
docker logs --tail 10 "$CID" 2>&1
echo "=== Host OOM Events ==="
dmesg 2>/dev/null | grep -i "oom\|killed process" | tail -5 || echo "(unavailable)"
Docker Compose with proper memory management
Full script: docker-compose-with-proper-memory-management.yml (62 lines)
version: "3.8"
services:
api:
build: ./api
deploy:
resources:
limits:
memory: 512M
cpus: "1.0"
reservations:
memory: 256M
environment:
JAVA_OPTS: "-XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
NODE_OPTIONS: "--max-old-space-size=384"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
retries: 3
worker:
build: ./worker
deploy:
resources:
limits:
memory: 1G
restart: on-failure:5
postgres:
image: postgres:16
deploy:
resources:
limits:
memory: 1G
environment:
POSTGRES_SHARED_BUFFERS: "256MB"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
redis:
image: redis:7-alpine
deploy:
resources:
limits:
memory: 256M
command: ["redis-server", "--maxmemory", "200mb", "--maxmemory-policy", "allkeys-lru"]
Memory leak detection in running container
Full script: memory-leak-detection-in-running-container.py (50 lines)
#!/usr/bin/env python3
# Input: Container name to monitor
# Output: Alert when memory growth exceeds threshold
import subprocess, json, time, sys
from datetime import datetime
def get_container_memory(cid):
result = subprocess.run(
["docker", "stats", "--no-stream", "--format", "{{json .}}", cid],
capture_output=True, text=True
)
if result.returncode != 0:
return None
data = json.loads(result.stdout.strip())
usage_str = data.get("MemUsage", "").split("/")[0].strip()
if "GiB" in usage_str:
return float(usage_str.replace("GiB", "")) * 1024 * 1024 * 1024
elif "MiB" in usage_str:
return float(usage_str.replace("MiB", "")) * 1024 * 1024
return 0
def monitor(cid, interval=30, growth_threshold_mb=50, samples=10):
history = []
print(f"Monitoring {cid} every {interval}s...")
while True:
mem = get_container_memory(cid)
if mem is None:
print(f"WARNING: Container not found"); break
history.append((datetime.now(), mem))
mem_mb = mem / 1024 / 1024
print(f"[{datetime.now():%H:%M:%S}] {mem_mb:.1f} MB")
if len(history) >= samples:
growth_mb = (mem - history[-samples][1]) / 1024 / 1024
if growth_mb > growth_threshold_mb:
rate = growth_mb / (samples * interval / 60)
print(f"ALERT: +{growth_mb:.1f}MB ({rate:.1f} MB/min)")
time.sleep(interval)
if __name__ == "__main__":
monitor(sys.argv[1] if len(sys.argv) > 1 else "myapp")
Anti-Patterns
Wrong: No memory limit in production
# BAD — container can consume all host memory [src1]
docker run -d myapp
Correct: Always set memory limits
# GOOD — constrained to 512MB [src1]
docker run -d --memory=512m --memory-swap=1g myapp
Wrong: JVM heap equals container memory
# BAD — no room for non-heap memory [src1, src4]
docker run --memory=1g myapp java -Xmx1g -jar app.jar
Correct: JVM heap at 70-80% of container memory
# GOOD — leaves 25-30% for metaspace, threads, buffers [src1, src4]
docker run --memory=1g myapp java -XX:MaxRAMPercentage=75.0 -jar app.jar
Wrong: Restarting OOM containers without fixing root cause
# BAD — infinite restart loop [src4]
services:
app:
restart: always
Correct: Limit restarts and investigate
# GOOD — stops after 5 failures [src4]
services:
app:
restart: on-failure:5
deploy:
resources:
limits:
memory: 512M
Wrong: Using --oom-kill-disable on cgroups v2
# BAD — flag silently discarded on cgroups v2 (Docker 27+) [src7]
docker run --memory=512m --oom-kill-disable myapp
Correct: Set appropriate limits and monitor instead
# GOOD — right-size limit and monitor proactively [src1, src5]
docker run --memory=512m --memory-reservation=384m myapp
Decision Logic
If docker inspect --format='{{.State.OOMKilled}}' returns true
—> Container was OOM-killed. Proceed with this unit's Step-by-Step Guide; start by checking the
configured --memory limit and live docker stats peak usage. [src1, src3]
If exit code is 137 but OOMKilled is false
—> Not an OOM issue. The container was killed by docker kill, a docker stop
timeout, or an orchestrator-initiated SIGKILL — skip this unit and check signal handlers + termination
policies. [src3, src4]
If application is Java/JVM and --memory is set
—> Set -XX:MaxRAMPercentage=75.0 (Java 10+) OR -Xmx to ~75% of
--memory. Setting -Xmx == --memory guarantees OOMKilled because metaspace, threads,
and GC buffers need the remaining 20-30%. [src1, src4]
If host is running Docker Engine 29+
—> Verify the host is on cgroup v2 (stat -fc %T /sys/fs/cgroup returns
cgroup2fs). cgroup v1 is deprecated in Engine 29; if still on v1, plan migration before May 2029
to avoid losing access to memory.high, memory.pressure, and the cgroup-aware OOM
killer. [src10]
If container OOMs at exactly the --memory value with no growth
—> Swap is disabled or --memory-swap == --memory. The container has no swap headroom.
Either right-size the limit upward by 20-30% or set --memory-swap higher than
--memory to allow paging (note: swap is slow — prefer right-sizing). [src1]
If docker stats shows gradual memory growth over hours/days
—> Application memory leak. Profile with language-specific tools (pprof for Go,
heapdump for Node, tracemalloc for Python, JFR/jcmd for JVM). Do NOT keep raising
--memory — the leak will eventually exhaust any limit. [src4, src8]
If multiple containers are OOM-killed simultaneously
—> Host itself ran out of memory, not individual containers. Check dmesg | grep -i oom
and free -m. Fix by setting --memory limits on every container (so the OOM killer
targets the offender, not random victims) and adding host RAM. [src4, src6]
If running on Docker Desktop (Mac/Windows) and docker run --memory is ignored
—> The Docker VM has a fixed allocation (default ~50% of host RAM). Total container memory cannot
exceed the VM allocation regardless of per-container --memory flags. Increase the allocation in
Docker Desktop Settings > Resources > Memory. [src1]
Common Pitfalls
- Exit 137 ≠ always OOM: Can be
docker kill,docker stoptimeout, or orchestrator. CheckOOMKilledflag. [src3, src4] --memorywithout--memory-swap: Swap defaults to 2× memory on Linux Engine, may be disabled on Docker Desktop. Set explicitly. [src1]- Java container-awareness: JVM before Java 10 ignores cgroup limits. Use 10+ with
UseContainerSupportor set-Xmx. [src1, src4] docker statsincludes page cache: High usage may be reclaimable cache, not actual pressure. Kernel evicts cache before OOM. [src2]- tmpfs counts against memory limit:
--tmpfsand/dev/shmcount against the limit. Large temp files can trigger OOM. [src1] - Build-time OOM:
docker buildruns in a container too.npm install, compilation spike memory. Increase Docker Desktop allocation. [src1] --oom-kill-disableand--kernel-memorydiscarded on cgroups v2: Docker Engine 27+ silently ignores these flags on cgroups v2 hosts. [src7]- cgroups v2 memory pressure: cgroups v2 can throttle before OOM via
memory.high. This causes slowness without kills — checkmemory.pressurefor stall indicators. [src5, src6]
Diagnostic Commands
# === Confirm OOM ===
docker inspect <cid> --format='OOMKilled: {{.State.OOMKilled}}, Exit: {{.State.ExitCode}}'
dmesg | grep -i "oom\|killed process" | tail -10
# === Memory Usage ===
docker stats
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"
# === Memory Limits ===
docker inspect <cid> --format='Memory: {{.HostConfig.Memory}}, Swap: {{.HostConfig.MemorySwap}}'
# === Inside Container ===
docker exec <cid> cat /proc/meminfo | head -5
docker exec <cid> cat /proc/1/status | grep VmRSS
docker exec <cid> ps aux --sort=-%mem | head -10
# === cgroups v2 ===
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.max
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.events
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.pressure
# === cgroups v1 (Legacy) ===
cat /sys/fs/cgroup/memory/docker/<id>/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/docker/<id>/memory.limit_in_bytes
# === Host Memory ===
free -m
vmstat 1 5
# === cgroups version check ===
stat -fc %T /sys/fs/cgroup # cgroup2fs = v2, tmpfs = v1
Version History & Compatibility
| Version | Behavior | Key Changes |
|---|---|---|
| Docker 29 (2026) | Current | cgroup v1 deprecated (support until May 2029); default nofile ulimit dropped
from 1,048,576 to 1,024 (containerd v2.1.5); SwapBytes +
MemorySwappiness added to Swarm service resources [src10] |
| Docker 28 (2025) | Stable | OOMScoreAdj added to docker service create and
docker stack (28.2.0); writable-cgroups=true SecurityOpt for cgroup
writes; deprecated KernelMemoryTCP accounting [src9] |
| Docker 27+ (2024) | Stable | OOMScoreAdj for services; --oom-kill-disable and --kernel-memory
discarded on cgroups v2; containerd default [src7] |
| Docker 25-26 | Stable | Improved memory metrics; containerd integration [src1] |
| Docker 24 | Stable | cgroups v2 fully supported; BuildKit default [src1, src6] |
| Docker 23 | Stable | Compose V2 default [src1] |
| Docker 20.10 | LTS-like | cgroups v2 support [src1, src6] |
| Docker 19.03 | Legacy | --memory-swap standardized [src1] |
| cgroups v2 | Modern Linux | memory.max; memory.events for OOM count; memory.pressure
for stall tracking; memory.high soft throttling [src6] |
| cgroups v1 | Legacy Linux | memory.limit_in_bytes + memory.oom_control;
--oom-kill-disable works [src6] |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Exit 137 and OOMKilled=true | Exit 137 but OOMKilled=false | Check for docker kill or stop timeout |
| Memory grows over time in stats | Memory is constant but app is slow | CPU profiling |
| Java/Node process needs right-sizing | Build step fails with OOM | Increase Docker Desktop memory |
| Multiple containers compete for memory | Single container with headroom | Other issues (CPU, IO, network) |
| Container keeps restarting with exit 137 | Container runs but performance degrades | Check cgroups v2 memory.pressure for throttling |
Important Caveats
- On Docker Desktop (Mac/Windows), the Docker VM has a fixed memory allocation (default ~50% of host RAM since
Docker Desktop 4.x). Even with
--memorylimits, total across all containers cannot exceed the VM's allocation. Docker Desktop 4.38 introduced a known regression causing higher baseline memory consumption. - The
OOMKilledflag only indicates PID 1 was OOM-killed. Child process OOM kills may showOOMKilled: false. Checkdmesg. docker statsmemory includes page cache. High usage may be reclaimable — kernel evicts cache before OOM.- Kubernetes uses the same cgroup mechanism but with
resources.limits.memory. On K8s 1.28+ with cgroups v2, the cgroup-aware OOM killer terminates ALL processes in the cgroup. AsingleProcessOOMKillkubelet flag is being developed to restore the old behavior. - In swap-enabled environments, containers may exceed
--memorywithout OOM (using swap). This causes slowness, not kills. Monitor swap separately. - Docker Engine 27.0.3 (2024) had a documented memory leak bug that could trigger host-level OOM killing of multiple containers simultaneously — upgrade to 27.1+ if affected.
- Docker Engine 29 (2026) deprecates cgroup v1. Support continues through at least May
2029, but on cgroup v1 hosts you cannot rely on newer memory features (
memory.highsoft throttling,memory.pressurestall tracking, the cgroup-aware OOM killer). Plan host migration to cgroup v2 before upgrading workloads. [src10] - Docker Engine 29 changes the default
nofileulimit from 1,048,576 to 1,024 (containerd v2.1.5). This is unrelated to OOMKilled in most cases, but processes that pre-allocate per-FD memory (some HTTP servers, database connection pools) can show very different memory profiles after upgrading. Override with--ulimit nofile=1048576or setdefault-ulimitsin/etc/docker/daemon.jsonif you depend on the old default. [src10] - Docker Engine 28 added
OOMScoreAdjfor Swarm services (28.2.0, May 2025) — prefer adjusting OOM score over disabling the OOM killer to bias which container the kernel kills under host-wide pressure. [src9]