How Do I Diagnose and Fix Java OutOfMemoryError?
How do I diagnose and fix Java OutOfMemoryError?
TL;DR
- Bottom line: Java
OutOfMemoryErrorhas 8+ distinct types, each with different causes and fixes. The diagnosis loop is: Identify (which OOM type from the error message) → Capture (heap dump with-XX:+HeapDumpOnOutOfMemoryError) → Analyze (with Eclipse MAT or VisualVM) → Fix (increase memory, fix leak, or tune GC). ~70% of OOM errors are "Java heap space" caused by memory leaks or undersized heaps. - Key tool/command:
jmap -dump:format=b,file=heap.hprof <pid>to capture a heap dump, then analyze with Eclipse MAT's Leak Suspects report. - Watch out for: Increasing
-Xmxwithout analyzing the heap dump just delays the crash — if there is a memory leak, you must find and fix the leaking code path. - Works with: Java 8+ (all current LTS versions: 8, 11, 17, 21). Metaspace replaces PermGen from Java 8 onward.
Constraints
- Never catch
OutOfMemoryErrorand continue normal operation — the JVM state may be corrupted after OOM. Only catch for graceful shutdown or logging. [src1] - In containers (Docker/Kubernetes), always use
-XX:+UseContainerSupport(default since JDK 10) and-XX:MaxRAMPercentage=75.0instead of fixed-Xmxvalues to respect container memory limits. [src4] - Always enable
-XX:+HeapDumpOnOutOfMemoryErrorin production — without a heap dump, diagnosis is guesswork. [src1, src6] - Do not set
-Xmxlarger than 80% of available physical/container RAM — the JVM needs native memory for threads, Metaspace, direct buffers, JIT code cache, and GC overhead. [src2] - If the error is "Kill process or sacrifice child" (Linux OOM Killer), the fix is at the OS/container level, not the JVM level. [src2]
Quick Reference
| # | OOM Type | Error Message | Likelihood | Primary Cause | Fix |
|---|---|---|---|---|---|
| 1 | Java heap space | OutOfMemoryError: Java heap space |
~50% | Memory leak or undersized -Xmx |
Analyze heap dump; increase -Xmx or fix leak [src1]
|
| 2 | GC overhead limit exceeded | OutOfMemoryError: GC overhead limit exceeded |
~20% | GC spending >98% of time, recovering <2% heap | Same as heap space — fix leak or increase heap [src1] |
| 3 | Metaspace | OutOfMemoryError: Metaspace |
~10% | Too many classes loaded (dynamic proxies, reflection) | Increase -XX:MaxMetaspaceSize; fix classloader leak [src1,
src2] |
| 4 | Unable to create native thread | OutOfMemoryError: Unable to create new native thread |
~8% | Thread leak or OS thread limit reached | Fix thread leak; reduce -Xss; increase ulimit -u [src2] |
| 5 | Direct buffer memory | OutOfMemoryError: Direct buffer memory |
~5% | NIO direct buffers not released | Increase -XX:MaxDirectMemorySize; fix buffer leak [src2] |
| 6 | Requested array size | OutOfMemoryError: Requested array size exceeds VM limit |
~3% | Array allocation > Integer.MAX_VALUE or heap |
Fix array sizing logic; process in chunks [src1] |
| 7 | Compressed class space | OutOfMemoryError: Compressed class space |
~2% | Class metadata exceeds compressed pointer space | Increase -XX:CompressedClassSpaceSize [src1]
|
| 8 | Kill process or sacrifice child | OutOfMemoryError: Kill process or sacrifice child |
~2% | Linux OOM Killer terminated JVM | Increase container/host RAM; tune oom_score_adj [src2] |
Decision Tree
START — java.lang.OutOfMemoryError thrown
├── Error message contains "Java heap space" or "GC overhead limit exceeded"?
│ ├── YES → Heap problem
│ │ ├── Heap dump available?
│ │ │ ├── YES → Eclipse MAT → Leak Suspects report
│ │ │ │ ├── Single object dominates heap? → Memory leak → Fix code
│ │ │ │ └── Many objects, heap nearly full → Increase -Xmx
│ │ │ └── NO → Enable -XX:+HeapDumpOnOutOfMemoryError, reproduce
│ │ └── Only under load spikes? → Increase -Xmx + add monitoring
│ └── NO ↓
├── Error message contains "Metaspace"?
│ ├── YES → Class loading problem
│ │ ├── On redeploy? → Classloader leak → Restart; fix leak
│ │ └── Grows slowly? → Increase -XX:MaxMetaspaceSize
│ └── NO ↓
├── Error message contains "Unable to create new native thread"?
│ ├── YES → Thread exhaustion
│ │ ├── Thread count growing? → Thread leak → Fix code
│ │ └── Hit OS limit? → Increase ulimit -u; reduce -Xss
│ └── NO ↓
├── "Direct buffer memory"? → Fix buffer release; increase -XX:MaxDirectMemorySize
├── "Kill process or sacrifice child"? → Increase container/host memory
└── Other (array size, compressed class, native method)
└── See Quick Reference table for specific fix
Step-by-Step Guide
1. Enable heap dump on OOM (do this first)
Configure the JVM to automatically capture a heap dump when any OutOfMemoryError occurs. This is the single most important diagnostic step. [src1, src6]
# Add to JVM startup options
java -XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/java/heap-dump.hprof \
-XX:+ExitOnOutOfMemoryError \
-jar myapp.jar
Verify:
java -XX:+PrintFlagsFinal -version 2>&1 | grep HeapDumpOnOutOfMemoryError → expected:
bool HeapDumpOnOutOfMemoryError = true
2. Identify the OOM type from the error message
The error message after OutOfMemoryError: tells you exactly which memory region is exhausted.
[src1]
# Search application logs for the specific OOM type
grep -A 5 "OutOfMemoryError" /var/log/myapp/application.log
# Common patterns:
# "Java heap space" → Heap (-Xmx)
# "GC overhead limit exceeded" → Heap (-Xmx)
# "Metaspace" → Class metadata (-XX:MaxMetaspaceSize)
# "Unable to create new native thread" → Thread limit
# "Direct buffer memory" → NIO buffers (-XX:MaxDirectMemorySize)
3. Capture a heap dump (if not auto-captured)
If HeapDumpOnOutOfMemoryError was not enabled, capture a dump from the running process. [src6]
# Find Java process PID
jps -lv
# Capture heap dump (~2 sec per GB, causes brief pause)
jmap -dump:format=b,file=/tmp/heap.hprof <pid>
# Alternative: jcmd (preferred on modern JDKs)
jcmd <pid> GC.heap_dump /tmp/heap.hprof
# Quick histogram (no full dump, minimal impact)
jmap -histo <pid> | head -30
4. Analyze the heap dump with Eclipse MAT
Eclipse Memory Analyzer Tool (MAT) is the most effective free tool for finding memory leaks. [src6, src4]
# Download Eclipse MAT from https://eclipse.dev/mat/
# Open the .hprof file in MAT
# Key reports to check:
# 1. Leak Suspects Report (automatic) — highlights top memory consumers
# 2. Dominator Tree — shows objects retaining the most memory
# 3. Histogram — sorted by retained heap size
# 4. Path to GC Roots (exclude weak refs) — shows WHY an object is retained
5. Check GC behavior
Review garbage collection logs to understand memory pressure patterns. [src4]
# Enable GC logging (JDK 9+)
java -Xlog:gc*:file=/var/log/java/gc.log:time,uptime,level,tags \
-jar myapp.jar
# Enable GC logging (JDK 8)
java -verbose:gc -Xloggc:/var/log/java/gc.log \
-XX:+PrintGCDetails -XX:+PrintGCDateStamps \
-jar myapp.jar
6. Apply the fix based on OOM type
After identifying the root cause, apply the appropriate fix. [src1, src2, src4]
# Heap space / GC overhead
java -Xms2g -Xmx4g -jar myapp.jar
# Metaspace
java -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -jar myapp.jar
# Native threads
java -Xss512k -jar myapp.jar
ulimit -u 65536
# Direct buffers
java -XX:MaxDirectMemorySize=512m -jar myapp.jar
# Containers — percentage-based sizing
java -XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:InitialRAMPercentage=50.0 \
-jar myapp.jar
Code Examples
Java: detect and log memory pressure before OOM
// Input: Running JVM with MemoryMXBean
// Output: Early warning logs when heap usage exceeds threshold
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class MemoryMonitor {
private static final double WARN = 0.80;
private static final double CRITICAL = 0.90;
public static void startMonitoring() {
MemoryMXBean memBean = ManagementFactory.getMemoryMXBean();
Executors.newSingleThreadScheduledExecutor(r -> {
Thread t = new Thread(r, "memory-monitor");
t.setDaemon(true);
return t;
}).scheduleAtFixedRate(() -> {
MemoryUsage heap = memBean.getHeapMemoryUsage();
double usedPct = (double) heap.getUsed() / heap.getMax();
if (usedPct > CRITICAL) {
System.err.printf("CRITICAL: Heap %.1f%% (%dMB/%dMB)%n",
usedPct * 100, heap.getUsed() >> 20, heap.getMax() >> 20);
} else if (usedPct > WARN) {
System.err.printf("WARNING: Heap %.1f%% (%dMB/%dMB)%n",
usedPct * 100, heap.getUsed() >> 20, heap.getMax() >> 20);
}
}, 0, 10, TimeUnit.SECONDS);
}
}
Bash: automated OOM diagnostic script
#!/bin/bash
# Input: Java PID (or auto-detect)
# Output: Memory diagnostics: heap, threads, top objects
PID="${1:-$(jps -lv | grep -v Jps | head -1 | awk '{print $1}')}"
[ -z "$PID" ] && { echo "No Java process found"; exit 1; }
echo "=== Java OOM Diagnostic Report — PID: $PID ==="
echo "--- JVM Flags ---"
jcmd "$PID" VM.flags 2>/dev/null || jinfo -flags "$PID"
echo -e "\n--- Heap Usage ---"
jcmd "$PID" GC.heap_info 2>/dev/null || jmap -heap "$PID"
echo -e "\n--- Top 20 Objects ---"
jmap -histo "$PID" | head -25
echo -e "\n--- Thread Count ---"
jstack "$PID" 2>/dev/null | grep -c "^\"" || echo "N/A"
echo -e "\n--- Native Memory (if NMT enabled) ---"
jcmd "$PID" VM.native_memory summary 2>/dev/null \
|| echo "NMT not enabled. Start with -XX:NativeMemoryTracking=summary"
Java: common memory leak patterns and fixes
// PATTERN 1: Unbounded cache — use bounded cache
// BAD: static Map<String, Object> cache = new HashMap<>();
// GOOD:
var cache = com.github.benmanes.caffeine.cache.Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(java.time.Duration.ofHours(1))
.build();
// PATTERN 2: Unclosed resources — use try-with-resources
try (var is = new java.io.FileInputStream(file)) {
// process stream
} // auto-closed
// PATTERN 3: Large collections — process in batches
int page = 0;
List<Record> batch;
do {
batch = repo.findByPage(page++, 1000);
processBatch(batch);
} while (!batch.isEmpty());
Anti-Patterns
Wrong: Catching OutOfMemoryError and continuing
// BAD — JVM state may be corrupted after OOM [src1]
try {
byte[] data = new byte[Integer.MAX_VALUE];
} catch (OutOfMemoryError e) {
System.out.println("Not enough memory, retrying...");
byte[] data = new byte[1024 * 1024]; // unreliable
}
Correct: Crash fast and analyze the heap dump
// GOOD — let the JVM crash; capture dump; fix root cause [src1]
// JVM flags: -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError
// The heap dump gives you the evidence to fix the actual problem.
Wrong: Blindly increasing -Xmx without analysis
# BAD — just delays the crash if there is a memory leak [src4, src6]
# Monday: java -Xmx2g -jar app.jar # OOM after 6 hours
# Tuesday: java -Xmx4g -jar app.jar # OOM after 12 hours
# Wednesday: java -Xmx8g -jar app.jar # OOM after 24 hours — still leaking!
Correct: Analyze heap dump first, then right-size
# GOOD — capture evidence, then decide [src4, src6]
# 1. Enable dump: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/
# 2. Reproduce OOM
# 3. Open dump in Eclipse MAT → Leak Suspects
# 4. Leak found? → fix code, keep original -Xmx
# 5. No leak? → increase -Xmx to actual usage + 25% headroom
Wrong: Using fixed -Xmx in containers
# BAD — ignores container limits; may get OOM-killed [src4]
docker run -m 2g myapp java -Xmx8g -jar app.jar
# Result: Linux OOM Killer terminates JVM
Correct: Use container-aware memory settings
# GOOD — JVM respects container memory limits [src4]
docker run -m 2g myapp java \
-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-jar app.jar
# JVM calculates: 2GB * 75% = 1.5GB max heap
Common Pitfalls
- PermGen vs Metaspace confusion: Java 7 uses
-XX:MaxPermSize; Java 8+ uses-XX:MaxMetaspaceSize. Using the wrong flag is silently ignored. Check Java version first. [src1, src2] - Heap dump fills disk: Dumps are roughly the size of used heap. A 16GB heap generates a
~16GB file. Ensure
-XX:HeapDumpPathhas sufficient space. [src6] - NMT overhead:
-XX:NativeMemoryTracking=detailadds 5-10% memory overhead. Usesummarymode in production. [src1] - Thread stack memory is invisible to heap metrics: Each thread uses
-Xssmemory (default 512KB-1MB). 2000 threads = 1-2GB of native memory, not counted in heap. [src2] - G1GC humongous allocations: Objects larger than half a G1 region (default 16MB) are
"humongous" and can cause premature OOM. Split large allocations or increase
-XX:G1HeapRegionSize. [src4] - finalize() delays GC: Objects with
finalize()require two GC cycles. Removefinalize()and usetry-with-resourcesorCleanerinstead. [src1, src5] - String.intern() memory leak: Aggressive
String.intern()in Java 7+ stores strings in the heap, which can cause exhaustion if interning unbounded user input. [src3]
Diagnostic Commands
# === Identify the Java process ===
jps -lv # list Java processes with JVM flags
jcmd -l # alternative (modern JDKs)
# === Heap overview ===
jcmd <pid> GC.heap_info # current heap usage (JDK 9+)
jmap -heap <pid> # heap summary (JDK 8)
# === Heap dump ===
jcmd <pid> GC.heap_dump /tmp/heap.hprof # recommended (JDK 9+)
jmap -dump:format=b,file=/tmp/heap.hprof <pid> # JDK 8+
# === Object histogram (quick, no full dump) ===
jmap -histo <pid> | head -30 # top objects by count
jmap -histo:live <pid> | head -30 # forces GC first
# === Thread analysis ===
jstack <pid> > /tmp/threads.txt # full thread dump
jstack <pid> | grep -c "^\"" # thread count
# === GC stats ===
jstat -gcutil <pid> 1000 10 # GC stats every 1s, 10 times
# Columns: S0% S1% E% O% M% — Eden, Old, Metaspace
# === Native memory (requires -XX:NativeMemoryTracking=summary) ===
jcmd <pid> VM.native_memory summary # heap, metaspace, threads, code cache
# === JVM flags ===
jcmd <pid> VM.flags # all active JVM flags
jinfo -flags <pid> # alternative
Decision Logic
Use these if/then rules for agent-driven recommendations. Each rule maps a user-described situation to a concrete next step.
If error message contains "Java heap space" or "GC overhead limit exceeded"
Capture a heap dump (-XX:+HeapDumpOnOutOfMemoryError or jcmd <pid> GC.heap_dump) and open it in Eclipse MAT. Run the Leak Suspects report first — if a single object/collection dominates the heap, fix the leak in code; if many objects share the heap, increase -Xmx. [src1, src6]
If error message contains "Metaspace" and it grows on every redeploy
Diagnose as a classloader leak. Application servers (Tomcat, JBoss) often leak ClassLoader instances on hot redeploy. Restart the JVM, then audit static caches, thread locals, and JDBC driver registration. Increasing -XX:MaxMetaspaceSize only delays the crash. [src2]
If error is "Unable to create new native thread" on JDK 21+
Check whether the workload should be using virtual threads (Project Loom) instead of platform threads. Virtual threads cost ~256 bytes vs ~1 MB for a platform thread, eliminating this class of OOM at scale. If platform threads are required, raise ulimit -u and reduce -Xss to 512k. [src9]
If error is "Direct buffer memory" in a Netty or Spring WebFlux application
Investigate PooledByteBufAllocator retention — the most common cause is ByteBuf instances escaping the request scope without ReferenceCountUtil.release(). Enable NMT (-XX:NativeMemoryTracking=summary) and run jcmd <pid> VM.native_memory summary to confirm direct memory growth. [src2, src9]
If running in Docker/Kubernetes and the container restarts with exit code 137
This is OS-level OOMKilled, not a JVM OOM — no heap dump will be generated. Set container memory to ~1.43× the desired max heap (off-heap accounts for ~30% of OOMs). Configure the JVM with -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0. Set resources.requests.memory close to resources.limits.memory. [src4, src9]
If on Java 25+ and using ZGC
ZGC is generational-only in Java 25 (non-generational mode removed). Use -XX:+UseZGC (no longer need -XX:+ZGenerational). For OOM diagnosis, enable JFR (-XX:StartFlightRecording=settings=default,maxsize=100M,maxage=1h) — sub-2% overhead and captures allocation hotspots without a full heap dump. Ensure the heap has enough headroom for the live set plus concurrent-GC working space. [src8]
If you need low-overhead production profiling instead of heap dumps
Use async-profiler (-e alloc for allocation profiling) or JFR. Both have <2% CPU overhead and surface allocation hotspots over time. Heap dumps remain authoritative for post-mortem leak analysis, but JFR/async-profiler are now the preferred live-investigation tools. [src9]
Version History & Compatibility
| Java Version | Memory Model Change | Key Flags |
|---|---|---|
| Java 7 and earlier | PermGen for class metadata | -XX:MaxPermSize=256m |
| Java 8 | PermGen replaced by Metaspace (native memory) | -XX:MaxMetaspaceSize=256m |
| Java 9 | Unified GC logging (-Xlog:gc*) |
-Xlog:gc*:file=gc.log |
| Java 10 | Container support default on | -XX:MaxRAMPercentage=75.0 |
| Java 11 (LTS) | ZGC experimental; Epsilon GC | -XX:+UseZGC (experimental) |
| Java 15 | ZGC production-ready | -XX:+UseZGC |
| Java 17 (LTS) | Strongly encapsulated JDK internals | --add-opens for reflection |
| Java 21 (LTS) | Virtual threads reduce native thread OOM | -XX:+UseZGC -XX:+ZGenerational |
| Java 25 (LTS) | ZGC is generational-only; non-generational mode removed | -XX:+UseZGC (always generational) |
When to Use / When Not to Use
| Use This Guide When | Don't Use When | Use Instead |
|---|---|---|
java.lang.OutOfMemoryError in logs |
java.lang.StackOverflowError |
Increase -Xss or fix recursion |
JVM killed by OOM Killer (dmesg | grep oom) |
Slow GC pauses but no OOM | GC tuning guide (G1GC/ZGC) |
| Heap usage grows monotonically over time | High CPU from GC but heap stable | GC algorithm selection guide |
| Application crashes after hours/days | Immediate crash on startup | Check classpath / dependency issues |
| Container restarts with exit code 137 | Container restarts with exit code 1 | Application error logs |
Important Caveats
- Heap dump captures a moment in time: The dump only shows objects alive at capture. Transient bursts may not appear. Use GC logs + heap dump together for a complete picture. [src6]
- Eclipse MAT vs VisualVM: MAT is far more powerful for leak analysis (Leak Suspects, dominator tree, OQL). VisualVM is better for live monitoring. Use MAT for post-mortem, VisualVM for live investigation. [src4, src6]
- Container memory != JVM memory: A 4GB container must fit heap + Metaspace + threads +
direct buffers + code cache + GC overhead. Set
MaxRAMPercentageto 75% of container limit. [src4] - ZGC and Shenandoah change OOM behavior: Low-latency collectors may delay OOM but do not prevent it. The heap dump is still the primary diagnostic tool. [src2]
- Kubernetes OOMKilled (exit 137) vs Java OOM: If the pod is killed before the JVM throws
OutOfMemoryError, no heap dump is generated. Setresources.requests.memoryclose toresources.limits.memory. [src4]