How to Diagnose and Fix Docker OOMKilled Errors

How do I diagnose and fix Docker OOMKilled errors?

TL;DR

Constraints

Quick Reference

# Cause Likelihood Signature Fix
1 Container memory limit too low ~30% OOMKilled: true; app uses expected memory Increase --memory limit [src1, src4]
2 Application memory leak ~25% Memory grows linearly; OOM after hours/days Profile and fix the leak [src4, src8]
3 JVM heap exceeds container limit ~15% Java app; -Xmx--memory Set -Xmx to 70-80% of container memory [src1, src4]
4 Large file/data processing ~8% OOM during specific operations Use streaming/chunked processing [src8]
5 Fork bomb / child process explosion ~5% Memory spikes instantly; many processes Set --pids-limit; fix fork logic [src1]
6 Host itself is out of memory ~5% Multiple containers OOMKilled; dmesg shows OOM Add host RAM; reduce container count [src4, src6]
7 No memory limit + host exhaustion ~4% No --memory flag; host runs out Always set memory limits in production [src1]
8 Memory-mapped files / tmpfs ~3% Container uses tmpfs or mmap Exclude tmpfs or increase limit [src1]
9 Swap disabled or misconfigured ~3% OOMs at exactly the --memory value Configure --memory-swap [src1]
10 Build-time OOM ~2% OOM during npm install, compilation Increase Docker Desktop memory [src1]

Decision Tree

START — Container exits with code 137
├── Is OOMKilled true? (docker inspect --format='{{.State.OOMKilled}}')
│   ├── YES → Out of memory ↓
│   └── NO → Not OOM — killed by docker kill, docker stop timeout, or orchestrator
├── Was a memory limit set? (docker inspect --format='{{.HostConfig.Memory}}')
│   ├── YES → Container exceeded this limit
│   │   ├── Is the limit too low? → Increase --memory [src1]
│   │   ├── Gradual growth (leak)? → Profile the application [src4, src8]
│   │   └── Java/JVM? → Check -Xmx (leave 25-30% for non-heap) [src1]
│   └── NO → Host ran out of memory
│       ├── Check host: free -m, dmesg | grep oom [src6]
│       └── Set limits on all containers [src1]
├── cgroups version? (stat -fc %T /sys/fs/cgroup)
│   ├── cgroup2fs → v2: check memory.max, memory.events [src6]
│   └── tmpfs → v1: check memory.limit_in_bytes, memory.oom_control [src6]
├── Build-time OOM? → Increase Docker Desktop memory [src1]
└── Monitor with docker stats to find peak usage [src2]
    ├── Peak near limit? → Increase limit by 20-30%
    └── Sudden spike? → Profile that code path

Step-by-Step Guide

1. Confirm the OOM kill

Not every exit code 137 is OOM — it can also be a manual docker kill. [src3, src4]

docker inspect <cid> --format='{{.State.OOMKilled}}'
docker inspect <cid> --format='{{json .State}}' | python -m json.tool
dmesg | grep -i "oom\|out of memory" | tail -20
# cgroups v2: check OOM event count
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.events

Verify: OOMKilled: true confirms the container was killed for exceeding memory.

2. Check current memory limits

Understand what limits are configured. [src1, src3]

docker inspect <cid> --format='Memory: {{.HostConfig.Memory}}, Swap: {{.HostConfig.MemorySwap}}'
docker inspect <cid> --format='{{.HostConfig.Memory}}' | awk '{print $1/1024/1024 "MB"}'
# Check cgroups version
stat -fc %T /sys/fs/cgroup   # cgroup2fs = v2, tmpfs = v1

Verify: Know the exact limit (0 = unlimited).

3. Monitor live memory usage

Use docker stats to see real-time memory consumption. [src2]

docker stats
docker stats --no-stream
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.PIDs}}"

Verify: Watch memory over time — gradual growth = leak; spikes = load-dependent.

4. Set appropriate memory limits

Configure limits based on observed usage plus headroom. [src1]

docker run --memory=512m --memory-swap=1g myapp
docker run --memory=512m --memory-swap=512m myapp  # No swap
docker run --memory=1g --memory-reservation=512m myapp  # Soft limit
# Docker Compose
services:
  app:
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M

5. Configure JVM / Node.js / Python memory

Runtime-specific settings must align with container limits. [src1, src4]

# Java 10+ (container-aware)
docker run --memory=1g myapp java -XX:MaxRAMPercentage=75.0 -jar app.jar

# Node.js
docker run --memory=512m myapp node --max-old-space-size=384 app.js

6. Profile memory usage inside the container

Get detailed memory breakdown. [src4, src8]

docker exec <cid> ps aux --sort=-%mem | head -20
docker exec <cid> cat /proc/meminfo | head -10
docker exec <cid> cat /proc/1/status | grep -E "VmRSS|VmSize|VmPeak"
# cgroups v2: check memory pressure
docker exec <cid> cat /sys/fs/cgroup/memory.pressure 2>/dev/null

7. Set up memory alerts

Proactive monitoring before OOM kills. [src5]

#!/bin/bash
THRESHOLD=80
while true; do
    docker stats --no-stream --format '{{.Name}} {{.MemPerc}}' | while read name pct; do
        pct_num=$(echo "$pct" | tr -d '%')
        if (( $(echo "$pct_num > $THRESHOLD" | bc -l) )); then
            echo "WARNING $(date): $name at ${pct}% memory"
        fi
    done
    sleep 30
done

Code Examples

Container memory diagnostics script

Full script: container-memory-diagnostics-script.sh (43 lines)

#!/bin/bash
# Input:  Container ID or name
# Output: Complete memory diagnostic report

CID="$1"
if [ -z "$CID" ]; then echo "Usage: $0 <container_id>"; exit 1; fi

echo "=== Memory Diagnostic Report ==="
echo "Container: $CID"
echo "Time: $(date -u)"

STATE=$(docker inspect "$CID" --format='{{.State.Status}}' 2>/dev/null)
if [ -z "$STATE" ]; then echo "Container not found"; exit 1; fi

OOM=$(docker inspect "$CID" --format='{{.State.OOMKilled}}')
EXIT=$(docker inspect "$CID" --format='{{.State.ExitCode}}')
echo "Status: $STATE | Exit: $EXIT | OOMKilled: $OOM"

MEM=$(docker inspect "$CID" --format='{{.HostConfig.Memory}}')
echo "Memory limit: $(echo "$MEM" | awk '{if($1>0) printf "%.0fMB\n",$1/1024/1024; else print "unlimited"}')"

if [ "$STATE" = "running" ]; then
    echo "=== Current Usage ==="
    docker stats --no-stream --format "Memory: {{.MemUsage}} ({{.MemPerc}})" "$CID"
    echo "=== Top Processes ==="
    docker exec "$CID" ps aux --sort=-%mem 2>/dev/null | head -6
fi

echo "=== Last 10 Log Lines ==="
docker logs --tail 10 "$CID" 2>&1

echo "=== Host OOM Events ==="
dmesg 2>/dev/null | grep -i "oom\|killed process" | tail -5 || echo "(unavailable)"

Docker Compose with proper memory management

Full script: docker-compose-with-proper-memory-management.yml (62 lines)

version: "3.8"
services:
  api:
    build: ./api
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
        reservations:
          memory: 256M
    environment:
      JAVA_OPTS: "-XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
      NODE_OPTIONS: "--max-old-space-size=384"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      retries: 3

  worker:
    build: ./worker
    deploy:
      resources:
        limits:
          memory: 1G
    restart: on-failure:5

  postgres:
    image: postgres:16
    deploy:
      resources:
        limits:
          memory: 1G
    environment:
      POSTGRES_SHARED_BUFFERS: "256MB"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s

  redis:
    image: redis:7-alpine
    deploy:
      resources:
        limits:
          memory: 256M
    command: ["redis-server", "--maxmemory", "200mb", "--maxmemory-policy", "allkeys-lru"]

Memory leak detection in running container

Full script: memory-leak-detection-in-running-container.py (50 lines)

#!/usr/bin/env python3
# Input:  Container name to monitor
# Output: Alert when memory growth exceeds threshold

import subprocess, json, time, sys
from datetime import datetime

def get_container_memory(cid):
    result = subprocess.run(
        ["docker", "stats", "--no-stream", "--format", "{{json .}}", cid],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        return None
    data = json.loads(result.stdout.strip())
    usage_str = data.get("MemUsage", "").split("/")[0].strip()
    if "GiB" in usage_str:
        return float(usage_str.replace("GiB", "")) * 1024 * 1024 * 1024
    elif "MiB" in usage_str:
        return float(usage_str.replace("MiB", "")) * 1024 * 1024
    return 0

def monitor(cid, interval=30, growth_threshold_mb=50, samples=10):
    history = []
    print(f"Monitoring {cid} every {interval}s...")
    while True:
        mem = get_container_memory(cid)
        if mem is None:
            print(f"WARNING: Container not found"); break
        history.append((datetime.now(), mem))
        mem_mb = mem / 1024 / 1024
        print(f"[{datetime.now():%H:%M:%S}] {mem_mb:.1f} MB")
        if len(history) >= samples:
            growth_mb = (mem - history[-samples][1]) / 1024 / 1024
            if growth_mb > growth_threshold_mb:
                rate = growth_mb / (samples * interval / 60)
                print(f"ALERT: +{growth_mb:.1f}MB ({rate:.1f} MB/min)")
        time.sleep(interval)

if __name__ == "__main__":
    monitor(sys.argv[1] if len(sys.argv) > 1 else "myapp")

Anti-Patterns

Wrong: No memory limit in production

# BAD — container can consume all host memory [src1]
docker run -d myapp

Correct: Always set memory limits

# GOOD — constrained to 512MB [src1]
docker run -d --memory=512m --memory-swap=1g myapp

Wrong: JVM heap equals container memory

# BAD — no room for non-heap memory [src1, src4]
docker run --memory=1g myapp java -Xmx1g -jar app.jar

Correct: JVM heap at 70-80% of container memory

# GOOD — leaves 25-30% for metaspace, threads, buffers [src1, src4]
docker run --memory=1g myapp java -XX:MaxRAMPercentage=75.0 -jar app.jar

Wrong: Restarting OOM containers without fixing root cause

# BAD — infinite restart loop [src4]
services:
  app:
    restart: always

Correct: Limit restarts and investigate

# GOOD — stops after 5 failures [src4]
services:
  app:
    restart: on-failure:5
    deploy:
      resources:
        limits:
          memory: 512M

Wrong: Using --oom-kill-disable on cgroups v2

# BAD — flag silently discarded on cgroups v2 (Docker 27+) [src7]
docker run --memory=512m --oom-kill-disable myapp

Correct: Set appropriate limits and monitor instead

# GOOD — right-size limit and monitor proactively [src1, src5]
docker run --memory=512m --memory-reservation=384m myapp

Decision Logic

If docker inspect --format='{{.State.OOMKilled}}' returns true

—> Container was OOM-killed. Proceed with this unit's Step-by-Step Guide; start by checking the configured --memory limit and live docker stats peak usage. [src1, src3]

If exit code is 137 but OOMKilled is false

—> Not an OOM issue. The container was killed by docker kill, a docker stop timeout, or an orchestrator-initiated SIGKILL — skip this unit and check signal handlers + termination policies. [src3, src4]

If application is Java/JVM and --memory is set

—> Set -XX:MaxRAMPercentage=75.0 (Java 10+) OR -Xmx to ~75% of --memory. Setting -Xmx == --memory guarantees OOMKilled because metaspace, threads, and GC buffers need the remaining 20-30%. [src1, src4]

If host is running Docker Engine 29+

—> Verify the host is on cgroup v2 (stat -fc %T /sys/fs/cgroup returns cgroup2fs). cgroup v1 is deprecated in Engine 29; if still on v1, plan migration before May 2029 to avoid losing access to memory.high, memory.pressure, and the cgroup-aware OOM killer. [src10]

If container OOMs at exactly the --memory value with no growth

—> Swap is disabled or --memory-swap == --memory. The container has no swap headroom. Either right-size the limit upward by 20-30% or set --memory-swap higher than --memory to allow paging (note: swap is slow — prefer right-sizing). [src1]

If docker stats shows gradual memory growth over hours/days

—> Application memory leak. Profile with language-specific tools (pprof for Go, heapdump for Node, tracemalloc for Python, JFR/jcmd for JVM). Do NOT keep raising --memory — the leak will eventually exhaust any limit. [src4, src8]

If multiple containers are OOM-killed simultaneously

—> Host itself ran out of memory, not individual containers. Check dmesg | grep -i oom and free -m. Fix by setting --memory limits on every container (so the OOM killer targets the offender, not random victims) and adding host RAM. [src4, src6]

If running on Docker Desktop (Mac/Windows) and docker run --memory is ignored

—> The Docker VM has a fixed allocation (default ~50% of host RAM). Total container memory cannot exceed the VM allocation regardless of per-container --memory flags. Increase the allocation in Docker Desktop Settings > Resources > Memory. [src1]

Common Pitfalls

Diagnostic Commands

# === Confirm OOM ===
docker inspect <cid> --format='OOMKilled: {{.State.OOMKilled}}, Exit: {{.State.ExitCode}}'
dmesg | grep -i "oom\|killed process" | tail -10

# === Memory Usage ===
docker stats
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

# === Memory Limits ===
docker inspect <cid> --format='Memory: {{.HostConfig.Memory}}, Swap: {{.HostConfig.MemorySwap}}'

# === Inside Container ===
docker exec <cid> cat /proc/meminfo | head -5
docker exec <cid> cat /proc/1/status | grep VmRSS
docker exec <cid> ps aux --sort=-%mem | head -10

# === cgroups v2 ===
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.current
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.max
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.events
cat /sys/fs/cgroup/system.slice/docker-<id>.scope/memory.pressure

# === cgroups v1 (Legacy) ===
cat /sys/fs/cgroup/memory/docker/<id>/memory.usage_in_bytes
cat /sys/fs/cgroup/memory/docker/<id>/memory.limit_in_bytes

# === Host Memory ===
free -m
vmstat 1 5

# === cgroups version check ===
stat -fc %T /sys/fs/cgroup   # cgroup2fs = v2, tmpfs = v1

Version History & Compatibility

Version Behavior Key Changes
Docker 29 (2026) Current cgroup v1 deprecated (support until May 2029); default nofile ulimit dropped from 1,048,576 to 1,024 (containerd v2.1.5); SwapBytes + MemorySwappiness added to Swarm service resources [src10]
Docker 28 (2025) Stable OOMScoreAdj added to docker service create and docker stack (28.2.0); writable-cgroups=true SecurityOpt for cgroup writes; deprecated KernelMemoryTCP accounting [src9]
Docker 27+ (2024) Stable OOMScoreAdj for services; --oom-kill-disable and --kernel-memory discarded on cgroups v2; containerd default [src7]
Docker 25-26 Stable Improved memory metrics; containerd integration [src1]
Docker 24 Stable cgroups v2 fully supported; BuildKit default [src1, src6]
Docker 23 Stable Compose V2 default [src1]
Docker 20.10 LTS-like cgroups v2 support [src1, src6]
Docker 19.03 Legacy --memory-swap standardized [src1]
cgroups v2 Modern Linux memory.max; memory.events for OOM count; memory.pressure for stall tracking; memory.high soft throttling [src6]
cgroups v1 Legacy Linux memory.limit_in_bytes + memory.oom_control; --oom-kill-disable works [src6]

When to Use / When Not to Use

Use When Don't Use When Use Instead
Exit 137 and OOMKilled=true Exit 137 but OOMKilled=false Check for docker kill or stop timeout
Memory grows over time in stats Memory is constant but app is slow CPU profiling
Java/Node process needs right-sizing Build step fails with OOM Increase Docker Desktop memory
Multiple containers compete for memory Single container with headroom Other issues (CPU, IO, network)
Container keeps restarting with exit 137 Container runs but performance degrades Check cgroups v2 memory.pressure for throttling

Important Caveats