How to Debug Kubernetes CrashLoopBackOff

Type: Software Reference Confidence: 0.93 Sources: 8 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

# Cause Likelihood Key Signal Fix
1 Application error / crash ~30% Exit code 1; error in logs Fix application code [src1, src3]
2 Missing/wrong env vars or config ~20% Exit 1; "env var not set" Fix ConfigMap/Secret/env [src3, src4]
3 OOMKilled (exit 137) ~15% OOMKilled: true Increase resources.limits.memory [src3, src5]
4 Liveness probe failure ~10% "Liveness probe failed" in events Use startupProbe [src2, src3]
5 Command not found (exit 127) ~5% "exec format error" Fix CMD/entrypoint [src4, src6]
6 Missing ConfigMap or Secret ~5% "CreateContainerConfigError" Create missing resource [src3, src5]
7 Init container failure ~4% Init container crashing kubectl logs -c init-name [src7]
8 Volume mount failure ~4% "MountVolume.SetUp failed" Fix PVC/storage class [src3, src5]
9 Permission denied (exit 126) ~3% "Permission denied" in logs Fix securityContext [src4]
10 Image pull issues then crash ~2% "ImagePullBackOff" Fix image/registry [src1, src3]
11 Container exits successfully (exit 0) ~2% Repeated exit 0 Use Job; or keep process running [src5, src6]

Decision Tree

START — Pod shows CrashLoopBackOff
├── kubectl describe pod → Check "Last State" Exit Code
│   ├── Exit 0 → App exits but shouldn't → needs long-running process or Job [src6]
│   ├── Exit 1 → App error → check logs: kubectl logs --previous [src1, src3]
│   ├── Exit 126 → Permission denied → fix securityContext [src4]
│   ├── Exit 127 → Command not found → fix image/CMD; check arch [src4, src6]
│   ├── Exit 137 → OOMKilled? → increase memory limits [src3, src5]
│   └── Exit 143 → SIGTERM → check probe timing or preStop hook [src2]
├── Check Events section
│   ├── "Liveness probe failed" → fix probe config; use startupProbe [src2, src3]
│   ├── "MountVolume.SetUp failed" → fix PVC [src3]
│   └── "configmap/secret not found" → create resource [src3, src5]
├── Check Init Containers
│   └── Failing? → kubectl logs -c <init-name> [src7]
├── Check Sidecar Containers (K8s 1.29+)
│   └── Sidecar healthy but main crashing? → debug main container separately [src7]
└── kubectl logs --previous → find the error message [src1]

Step-by-Step Guide

1. Get the pod status and restart count

Identify which pods are in CrashLoopBackOff. [src1, src3]

kubectl get pods -A | grep CrashLoopBackOff
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].restartCount}'

2. Describe the pod for events and state

The single most important debugging command. [src1, src3, src5]

kubectl describe pod <pod> -n <ns>
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'

3. Read the previous container's logs

The *previous* container has the crash output. [src1, src3, src5]

kubectl logs <pod> --previous
kubectl logs <pod> -c <container> --previous
kubectl logs <pod> -c <init-container-name>
kubectl logs <pod> -c <sidecar-name>  # K8s 1.29+

4. Fix OOMKilled (exit code 137)

Container exceeded its memory limit. [src3, src5]

resources:
  requests:
    memory: 256Mi
  limits:
    memory: 512Mi
kubectl top pod <pod>

5. Fix liveness probe failures

Use startupProbe for slow starters. [src2, src3]

startupProbe:
  httpGet:
    path: /health
    port: 8080
  periodSeconds: 10
  failureThreshold: 30    # 5 min startup allowance
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  periodSeconds: 15
  failureThreshold: 3

6. Fix missing ConfigMap / Secret / env vars

Missing configuration causes immediate crashes. [src3, src4, src5]

kubectl get configmap -n <ns>
kubectl get secret -n <ns>
kubectl get pod <pod> -o jsonpath='{.spec.containers[0].env}'

7. Debug with ephemeral containers

When logs don't show enough. K8s 1.25+ GA. [src1, src4]

kubectl debug -it <pod> --image=busybox --target=<container>
kubectl exec -it <pod> -- /bin/sh
kubectl get events -n <ns> --sort-by='.lastTimestamp' | tail -20

Code Examples

Comprehensive CrashLoopBackOff diagnostic script

#!/bin/bash
POD="$1"; NS="${2:-default}"
if [ -z "$POD" ]; then
    echo "Usage: $0 <pod> [namespace]"
    kubectl get pods -A | grep CrashLoopBackOff; exit 1
fi

echo "=== CrashLoopBackOff Diagnostic: $POD (ns: $NS) ==="
RESTARTS=$(kubectl get pod "$POD" -n "$NS" -o jsonpath='{.status.containerStatuses[0].restartCount}')
EXIT=$(kubectl get pod "$POD" -n "$NS" -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}')
REASON=$(kubectl get pod "$POD" -n "$NS" -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}')
echo "Restarts: $RESTARTS | Exit: $EXIT | Reason: $REASON"

case "$EXIT" in
    0)   echo "Exit 0 — completed (shouldn't for services)" ;;
    1)   echo "Exit 1 — Application error" ;;
    126) echo "Exit 126 — Permission denied" ;;
    127) echo "Exit 127 — Command not found" ;;
    137) echo "Exit 137 — $( [ "$REASON" = "OOMKilled" ] && echo "OOMKilled" || echo "SIGKILL")" ;;
    143) echo "Exit 143 — SIGTERM" ;;
esac

echo "=== Resources ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{range .spec.containers[*]}{.name}: req={.resources.requests.memory} lim={.resources.limits.memory}{"\n"}{end}'
kubectl top pod "$POD" -n "$NS" 2>/dev/null || echo "(metrics unavailable)"

echo "=== Events ==="
kubectl get events -n "$NS" --field-selector "involvedObject.name=$POD" --sort-by='.lastTimestamp' | tail -10

echo "=== Previous Logs (30 lines) ==="
kubectl logs "$POD" -n "$NS" --previous --tail=30 2>/dev/null || echo "(none)"

Production-ready pod with proper probes and resources

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      initContainers:
        - name: wait-for-db
          image: busybox:1.36
          command: ['sh', '-c', 'until nc -zv db-service 5432; do sleep 2; done']
          resources:
            limits: { cpu: 50m, memory: 32Mi }
        - name: log-shipper      # Sidecar (K8s 1.29+)
          image: fluentbit:2.2
          restartPolicy: Always   # Survives main container restarts
          resources:
            requests: { cpu: 50m, memory: 64Mi }
            limits: { cpu: 100m, memory: 128Mi }
      containers:
        - name: api
          image: myapp:1.2.3
          ports: [{ containerPort: 8080 }]
          envFrom:
            - configMapRef: { name: api-config }
            - secretRef: { name: api-secrets }
          resources:
            requests: { cpu: 100m, memory: 256Mi }
            limits: { cpu: 500m, memory: 512Mi }
          startupProbe:
            httpGet: { path: /health, port: 8080 }
            periodSeconds: 10
            failureThreshold: 30
          livenessProbe:
            httpGet: { path: /health, port: 8080 }
            periodSeconds: 15
            failureThreshold: 3
          readinessProbe:
            httpGet: { path: /ready, port: 8080 }
            periodSeconds: 10
      terminationGracePeriodSeconds: 30

Automated CrashLoopBackOff watcher

#!/usr/bin/env python3
import subprocess, json, time
from datetime import datetime

def get_crash_pods():
    result = subprocess.run(
        "kubectl get pods -A -o json".split(), capture_output=True, text=True)
    if result.returncode != 0: return []
    pods = json.loads(result.stdout)
    crash = []
    for pod in pods.get("items", []):
        for cs in pod.get("status", {}).get("containerStatuses", []):
            w = cs.get("state", {}).get("waiting", {})
            if w.get("reason") == "CrashLoopBackOff":
                last = cs.get("lastState", {}).get("terminated", {})
                crash.append({
                    "name": pod["metadata"]["name"],
                    "ns": pod["metadata"]["namespace"],
                    "restarts": cs.get("restartCount", 0),
                    "exit": last.get("exitCode"),
                    "reason": last.get("reason", "Unknown"),
                })
    return crash

def monitor(interval=30):
    seen = set()
    while True:
        for p in get_crash_pods():
            key = f"{p['ns']}/{p['name']}"
            if key not in seen:
                seen.add(key)
                print(f"[{datetime.now():%H:%M:%S}] ALERT {key} "
                      f"exit={p['exit']} ({p['reason']}) restarts={p['restarts']}")
        time.sleep(interval)

if __name__ == "__main__":
    monitor()

Anti-Patterns

Wrong: No resource limits

# BAD — unlimited resources; OOM kills other pods [src3, src5]
containers:
  - name: app
    image: myapp:latest

Correct: Always set resource requests and limits

# GOOD — predictable; scheduler places properly [src3, src5]
containers:
  - name: app
    image: myapp:1.2.3
    resources:
      requests: { cpu: 100m, memory: 256Mi }
      limits: { cpu: 500m, memory: 512Mi }

Wrong: Aggressive liveness probe on slow-starting app

# BAD — kills Spring Boot during startup [src2, src3]
livenessProbe:
  httpGet: { path: /health, port: 8080 }
  initialDelaySeconds: 5
  periodSeconds: 3
  failureThreshold: 3

Correct: Use startupProbe for slow starters

# GOOD — startupProbe allows up to 5 min init [src2]
startupProbe:
  httpGet: { path: /health, port: 8080 }
  periodSeconds: 10
  failureThreshold: 30
livenessProbe:
  httpGet: { path: /health, port: 8080 }
  periodSeconds: 15

Wrong: Using :latest tag

# BAD — unpredictable; hard to rollback [src3, src6]
containers:
  - name: app
    image: myapp:latest
    imagePullPolicy: Always

Correct: Pin specific image version

# GOOD — reproducible; easy rollback [src3, src6]
containers:
  - name: app
    image: myapp:1.2.3
    imagePullPolicy: IfNotPresent

Wrong: Sidecar as regular container (K8s 1.29+)

# BAD — sidecar dies with main container [src7]
containers:
  - name: app
    image: myapp:1.2.3
  - name: log-shipper
    image: fluentbit:2.2

Correct: Sidecar as init container with restartPolicy: Always

# GOOD — sidecar survives main container restarts [src7, src8]
initContainers:
  - name: log-shipper
    image: fluentbit:2.2
    restartPolicy: Always
containers:
  - name: app
    image: myapp:1.2.3

Common Pitfalls

Diagnostic Commands

# === Find CrashLoopBackOff pods ===
kubectl get pods -A | grep CrashLoopBackOff

# === Pod details ===
kubectl describe pod <pod> -n <ns>
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

# === Logs ===
kubectl logs <pod> --previous
kubectl logs <pod> -c <container> --previous
kubectl logs <pod> -c <init-container>
kubectl logs <pod> -c <sidecar>

# === Events ===
kubectl get events -n <ns> --sort-by='.lastTimestamp' | tail -20

# === Resources ===
kubectl top pod <pod>
kubectl top nodes

# === Debugging ===
kubectl debug -it <pod> --image=busybox --target=<container>
kubectl exec -it <pod> -- /bin/sh

# === Config ===
kubectl get configmap -n <ns>
kubectl get secret -n <ns>

# === Node health ===
kubectl describe node <node> | grep -A5 "Conditions"

Version History & Compatibility

Version Behavior Key Changes
K8s 1.32+ Current KubeletCrashLoopBackOffMax alpha — configurable max backoff [src8]
K8s 1.29+ Stable Sidecar containers GA; better init container lifecycle [src1, src7]
K8s 1.28 Stable Sidecar alpha; improved probe logging [src1]
K8s 1.25 Ephemeral GA Ephemeral debug containers GA [src1]
K8s 1.23 Debug beta kubectl debug beta [src1]
K8s 1.20 startupProbe GA startupProbe graduated to stable [src2]
K8s 1.18 startupProbe beta startupProbe for slow-starting containers [src2]
K8s 1.16 Stable probes Liveness/readiness stable in all workloads [src2]

When to Use / When Not to Use

Use When Don't Use When Use Instead
Pod shows CrashLoopBackOff Pod stuck in Pending Debug scheduling: kubectl describe pod
Container keeps restarting Pod in ImagePullBackOff Fix image name/registry
Exit code is non-zero Pod running but not ready Debug readiness probe
Events show probe failures Container is Evicted Check node pressure

Important Caveats

Related Units