How to Debug Kubernetes Pods Stuck in Pending State
How do I debug Kubernetes pods stuck in Pending state?
TL;DR
- Bottom line: A Pending pod means the scheduler accepted it but can't find a suitable
node. Most common: insufficient CPU/memory (~35%), taints without tolerations (~20%), PVC binding
failures (~15%), nodeSelector mismatches (~12%). The Events section in
kubectl describe podalways tells you why. - Key tool/command:
kubectl describe pod <pod> -n <ns>— the Events section shows the exact scheduler reason. - Watch out for: Resource requests (not limits) drive scheduling.
WaitForFirstConsumerPVCs staying Pending is normal, not an error. - Works with: All Kubernetes 1.20+. Same concepts for OpenShift, EKS, GKE, AKS, k3s, minikube. K8s 1.35 added In-Place Pod Resize GA; K8s 1.36 (April 2026) promotes Gang Scheduling to Beta and adds Workload-Aware Preemption (alpha) — PodGroups now schedule and preempt as a unit, fixing partial-binding stalls in AI/HPC workloads. [src9]
Constraints
- Scheduling uses resource requests not limits — a pod requesting 4Gi is unschedulable on a node with 3Gi allocatable even if actual usage is low. [src1, src4]
- Never delete a Pending pod to "fix" scheduling — delete the root cause (taint, quota, missing label) or the pod will just Pending again. [src3]
WaitForFirstConsumerPVCs are zone-aware — PV in zone-a + only available node in zone-b = both stay Pending. [src6]- On managed K8s (EKS/GKE/AKS), wait 1-5 min for Cluster Autoscaler before reducing requests — a Pending pod triggers node scale-up. [src4]
- In-Place Pod Resize (GA in K8s 1.35) can adjust CPU/memory on running pods but cannot fix scheduling — already-Pending pods must still wait for node capacity. [src8]
- RequiredDuringScheduling anti-affinity with more replicas than nodes is permanently unsatisfiable — use PreferredDuringScheduling instead. [src7]
- On K8s 1.36+, a PodGroup using gang scheduling stays Pending until ALL members fit — partial-binding is
rejected by design. Check
kubectl describe podgroup <name>before debugging individual pods. [src9]
Quick Reference
| # | Cause | Likelihood | Scheduler Message | Fix |
|---|---|---|---|---|
| 1 | Insufficient CPU/memory | ~35% | "Insufficient cpu/memory" | Reduce requests, add nodes [src1, src4] |
| 2 | Taints without tolerations | ~20% | "had taint pod didn't tolerate" | Add toleration or remove taint [src2, src4] |
| 3 | PVC not bound | ~15% | "unbound PersistentVolumeClaims" | Fix StorageClass, provision PV [src3, src6] |
| 4 | nodeSelector mismatch | ~12% | "didn't match node affinity/selector" | Fix selector or label nodes [src4, src7] |
| 5 | Node affinity mismatch | ~5% | "didn't match node affinity" | Fix affinity or add nodes [src4, src7] |
| 6 | Pod anti-affinity | ~4% | "didn't match anti-affinity" | Reduce replicas or add nodes [src4, src7] |
| 7 | Node cordoned | ~3% | "nodes were unschedulable" | kubectl uncordon <node> [src3,
src4] |
| 8 | ResourceQuota exceeded | ~2% | "exceeded quota" | Increase quota [src1, src3] |
| 9 | PodDisruptionBudget blocking | ~2% | "disruption budget violated" | Adjust PDB [src3] |
| 10 | Topology spread constraints | ~1% | "didn't satisfy topology spread" | Relax constraints [src7] |
| 11 | Node condition (DiskPressure/MemoryPressure) | ~1% | "node(s) had condition" | Resolve node pressure [src4, src5] |
| 12 | Scheduler not running | <1% | No events at all | Check kube-scheduler [src1] |
| 13 | PodGroup gang scheduling waiting (K8s 1.36+) | varies (AI/HPC) | "WaitingForGangScheduling" or partial PodGroup binding | kubectl describe podgroup; group waits for ALL members [src9]
|
Decision Tree
START — Pod stuck in Pending
├── kubectl describe pod → Check Events section
│ ├── "Insufficient cpu/memory" → Check requests vs allocatable [src1, src4]
│ │ └── K8s 1.35+? → Consider In-Place Pod Resize for running pods [src8]
│ ├── "taint pod didn't tolerate" → Add toleration or remove taint [src2]
│ ├── "unbound PersistentVolumeClaims" → Fix PVC/StorageClass [src6]
│ ├── "didn't match node affinity/selector" → Fix labels/selectors [src7]
│ ├── "nodes were unschedulable" → kubectl uncordon [src3]
│ ├── "exceeded quota" → Increase ResourceQuota [src1]
│ ├── "didn't satisfy topology spread" → Relax maxSkew or ScheduleAnyway [src7]
│ ├── "node(s) had condition" → Check DiskPressure/MemoryPressure [src4]
│ └── No events → Check kube-scheduler is running [src1]
├── Managed K8s? → Check Cluster Autoscaler events [src4]
└── kubectl get events -A --sort-by='.lastTimestamp'
Step-by-Step Guide
1. Identify the Pending pods
Find which pods are stuck. [src1, src3]
kubectl get pods -A --field-selector=status.phase=Pending
kubectl get pods -o wide | grep Pending
2. Read the scheduler events
The most important step — events tell you exactly why. [src1, src3, src4]
kubectl describe pod <pod> -n <ns>
kubectl get events -n <ns> --field-selector involvedObject.name=<pod>
3. Check node resources
Compare requests against allocatable. [src1, src4, src5]
kubectl top nodes
kubectl describe nodes | grep -A5 "Allocatable:"
kubectl describe node <node> | grep -A20 "Allocated resources"
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].resources.requests}'
4. Check taints and tolerations
Taints repel pods without matching tolerations. [src2, src4]
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints[*].key
kubectl get pod <pod> -o jsonpath='{.spec.tolerations}'
kubectl taint node <node> key=value:NoSchedule- # Remove taint
5. Fix PVC binding issues
PVCs must bind before pods can schedule. [src3, src6]
kubectl get pvc -n <ns>
kubectl describe pvc <name> -n <ns>
kubectl get pv
kubectl get storageclass
6. Fix nodeSelector and affinity
Selectors must match actual node labels. [src4, src7]
kubectl get pod <pod> -o jsonpath='{.spec.nodeSelector}'
kubectl get nodes --show-labels
kubectl label node <node> disktype=ssd
7. Check ResourceQuota and node cordon
Quotas and cordoned nodes block scheduling. [src1, src3]
kubectl get resourcequota -n <ns>
kubectl get nodes # "SchedulingDisabled" = cordoned
kubectl uncordon <node>
kubectl describe limitrange -n <ns>
Code Examples
Comprehensive Pending pod diagnostic script
#!/bin/bash
POD="$1"; NS="${2:-default}"
if [ -z "$POD" ]; then
echo "Usage: $0 <pod> [ns]"
kubectl get pods -A --field-selector=status.phase=Pending; exit 1
fi
echo "=== Pending Pod Diagnostic: $POD (ns: $NS) ==="
kubectl get pod "$POD" -n "$NS" -o wide
echo "=== Resource Requests ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{range .spec.containers[*]} {.name}: cpu={.resources.requests.cpu} mem={.resources.requests.memory}{"\n"}{end}'
echo "=== Node Selector ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{.spec.nodeSelector}'
echo "=== Tolerations ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{range .spec.tolerations[*]} {.key}={.value}:{.effect}{"\n"}{end}'
echo "=== Node Resources ==="
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory,TAINTS:.spec.taints[*].key
echo "=== PVCs ==="
PVCS=$(kubectl get pod "$POD" -n "$NS" -o jsonpath='{.spec.volumes[*].persistentVolumeClaim.claimName}')
for PVC in $PVCS; do
kubectl get pvc "$PVC" -n "$NS" 2>/dev/null
done
echo "=== Scheduler Events ==="
kubectl get events -n "$NS" --field-selector "involvedObject.name=$POD" --sort-by='.lastTimestamp' | tail -10
# Auto-diagnosis
EVENTS=$(kubectl get events -n "$NS" --field-selector "involvedObject.name=$POD" -o jsonpath='{.items[-1].message}')
if echo "$EVENTS" | grep -qi "insufficient"; then echo "DIAGNOSIS: Insufficient resources"
elif echo "$EVENTS" | grep -qi "taint"; then echo "DIAGNOSIS: Taint/toleration mismatch"
elif echo "$EVENTS" | grep -qi "PersistentVolumeClaim"; then echo "DIAGNOSIS: PVC not bound"
elif echo "$EVENTS" | grep -qi "affinity\|selector"; then echo "DIAGNOSIS: Selector/affinity mismatch"
elif echo "$EVENTS" | grep -qi "unschedulable"; then echo "DIAGNOSIS: Node cordoned"
elif echo "$EVENTS" | grep -qi "condition"; then echo "DIAGNOSIS: Node condition issue"
elif [ -z "$EVENTS" ]; then echo "DIAGNOSIS: No events — check kube-scheduler"
fi
Production-ready pod with proper scheduling config
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: myapp:1.2.3
resources:
requests: { cpu: 250m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: "dedicated"
operator: "Equal"
value: "api"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: disktype
operator: In
values: ["ssd"]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-server
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api-server
Cluster capacity audit script
#!/usr/bin/env python3
import subprocess, json
def run_kubectl(cmd):
result = subprocess.run(f"kubectl {cmd} -o json".split(), capture_output=True, text=True)
return json.loads(result.stdout) if result.returncode == 0 else None
def parse_resource(value):
if not value: return 0
value = str(value)
if value.endswith("m"): return int(value[:-1])
elif value.endswith("Mi"): return int(value[:-2]) * 1024 * 1024
elif value.endswith("Gi"): return int(value[:-2]) * 1024**3
else:
try: return int(float(value) * 1000)
except: return 0
def audit():
nodes = run_kubectl("get nodes")
pods = run_kubectl("get pods -A")
if not nodes or not pods: print("Cannot access cluster"); return
node_req = {}
for pod in pods.get("items", []):
node = pod.get("spec", {}).get("nodeName")
if not node or pod["status"].get("phase") != "Running": continue
if node not in node_req: node_req[node] = {"cpu": 0, "mem": 0}
for c in pod["spec"].get("containers", []):
r = c.get("resources", {}).get("requests", {})
node_req[node]["cpu"] += parse_resource(r.get("cpu", "0"))
node_req[node]["mem"] += parse_resource(r.get("memory", "0"))
for node in nodes["items"]:
name = node["metadata"]["name"]
alloc = node["status"]["allocatable"]
taints = [t["key"] for t in node["spec"].get("taints", [])]
req = node_req.get(name, {"cpu": 0, "mem": 0})
print(f"{name}: CPU free={parse_resource(alloc['cpu'])-req['cpu']}m "
f"Mem free={((parse_resource(alloc['memory'])-req['mem'])/1024**2):.0f}Mi "
f"Taints={taints or 'none'}")
if __name__ == "__main__":
audit()
Anti-Patterns
Wrong: Requesting more resources than any node has
# BAD — no node has 64Gi allocatable [src1, src4]
resources:
requests:
cpu: "16"
memory: 64Gi
Correct: Size requests based on node capacity
# GOOD — fits within typical node [src1, src4]
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: "1", memory: 1Gi }
Wrong: nodeSelector for non-existent labels
# BAD — label doesn't exist on any node [src4, src7]
nodeSelector:
gpu-type: a100
Correct: Verify labels exist first
# GOOD — check labels, then set selector [src4, src7]
kubectl get nodes --show-labels | grep gpu-type
kubectl label node worker-1 gpu-type=a100
Wrong: No tolerations for tainted nodes
# BAD — all nodes tainted, no toleration [src2, src4]
spec:
containers:
- name: app
image: myapp
# No tolerations!
Correct: Add matching tolerations
# GOOD — toleration matches taint [src2, src4]
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "special"
effect: "NoSchedule"
Wrong: RequiredDuringScheduling anti-affinity with too many replicas
# BAD — 5 replicas with required anti-affinity on 3-node cluster [src7]
spec:
replicas: 5
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
# 2 pods will be Pending forever
Correct: Use preferredDuringScheduling anti-affinity
# GOOD — preferred anti-affinity allows co-location as fallback [src7]
spec:
replicas: 5
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
Common Pitfalls
- Requests vs limits confusion: Scheduling uses requests, not limits. Right-size
requests based on
kubectl topdata. [src1, src4] - WaitForFirstConsumer PVCs: These stay Pending until a pod needs them — by design, not a bug. [src6]
- Control-plane taints: Master nodes have
NoScheduleby default. On single-node clusters, remove the taint. [src2, src4] - Hidden resource consumers: DaemonSets and kube-system pods reduce schedulable capacity.
Check
kubectl describe node. [src4, src5] - Anti-affinity deadlocks: RequiredDuringScheduling anti-affinity with more replicas than nodes is unsatisfiable. Use Preferred instead. [src7]
- ResourceQuota silently blocks: Exhausted namespace quota makes new pods Pending. Check
kubectl describe quota. [src1, src3] - maxUnavailable percentage rounding: For small deployments (<4 replicas), 25% maxUnavailable rounds to 0, blocking rolling updates. Use absolute values. [src4]
- Node conditions blocking scheduling: Nodes with DiskPressure, MemoryPressure, or
Ready=False are excluded from scheduling. Check with
kubectl describe node. [src4, src5]
Diagnostic Commands
# === Find Pending pods ===
kubectl get pods -A --field-selector=status.phase=Pending
# === Pod details ===
kubectl describe pod <pod> -n <ns>
# === Node resources ===
kubectl top nodes
kubectl describe nodes | grep -A5 "Allocatable:"
# === Taints ===
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints[*].key
# === Labels ===
kubectl get nodes --show-labels
# === PVC ===
kubectl get pvc -n <ns>
kubectl get pv
kubectl get storageclass
# === Quotas ===
kubectl get resourcequota -n <ns>
kubectl describe limitrange -n <ns>
# === Schedulability ===
kubectl get nodes # SchedulingDisabled = cordoned
kubectl uncordon <node>
# === Node conditions ===
kubectl describe node <node> | grep -A5 "Conditions:"
# === Cluster Autoscaler (managed K8s) ===
kubectl get events -n kube-system | grep cluster-autoscaler
Version History & Compatibility
| Version | Behavior | Key Changes |
|---|---|---|
| K8s 1.36 (2026-04) | Current | Gang Scheduling Beta (PodGroup API); Workload-Aware Preemption alpha; topology-aware
workload scheduling alpha (KEP-5732); DRA firstAvailable device requests;
ResourceClaim sharing across PodGroup members [src9]
|
| K8s 1.35 (2025-12) | Stable | In-Place Pod Resize GA; gang scheduling alpha; mutable PV node affinity [src8] |
| K8s 1.34 (2025-08) | Stable | Async scheduler API; nominatedNodeName for more pods [src1] |
| K8s 1.32 (2025-04) | Stable | QueueingHint beta (faster Pending pod requeue) [src1] |
| K8s 1.29+ (2024) | Stable | Improved scheduling hints; sidecar containers GA [src1] |
| K8s 1.27 (2023) | Stable | In-place resource resize alpha [src8] |
| K8s 1.24 (2022) | Stable | Non-graceful node shutdown; PV topology [src6] |
| K8s 1.19 (2020) | TopologySpread GA | PodTopologySpreadConstraints GA [src7] |
| K8s 1.18 (2020) | WaitForFirstConsumer | More StorageClasses default to delayed binding [src6] |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Pod shows Pending status | Pod shows CrashLoopBackOff | Debug container crash (logs, exit code) |
| Events mention scheduling failures | Pod shows ContainerCreating | Wait; or debug image pull / volume |
| No node selected for pod | Pod Running but not Ready | Debug readiness probe |
| PVC stuck in Pending | Pod Evicted | Check node pressure conditions |
| Cluster Autoscaler not scaling up | Pod OOMKilled | Debug memory limits (not scheduling) |
Important Caveats
- Scheduler considers resource requests, not actual usage. Use VPA to right-size automatically. K8s 1.35 VPA InPlaceOrRecreate mode is now beta. [src1, src8]
- Managed K8s (EKS, GKE, AKS) may auto-scale nodes — a Pending pod can trigger scale-up in 1-5 minutes.
Monitor
NotTriggerScaleUpevents from Cluster Autoscaler. [src4] WaitForFirstConsumerPVCs are zone-aware. PV in zone-a + node in zone-b = both stay Pending. [src6]- Check
ResourceQuotaandLimitRange— they silently inject defaults or block pods. [src1, src3] - Higher-priority pods can preempt lower-priority ones. Check
PriorityClassobjects. In K8s 1.35, In-Place Pod Resize also respects priority for deferred resizes. [src1, src8] - In multi-tenant clusters, per-namespace quota may be full even if cluster has capacity. [src3]
- K8s 1.36+ gang scheduling (PodGroup, Beta) and Workload-Aware Preemption (alpha) change the unit of
analysis: groups of related pods now schedule and preempt together. If you're on 1.36+ and a job's pods
are stuck Pending without obvious resource issues, run
kubectl describe podgroup <name>first — the group may be waiting for ALL members to fit before binding any. Workload-Aware Preemption also means an entire lower-priority group is preempted as one rather than picking off individual pods. [src9]