kubectl describe pod
always tells you why.kubectl describe pod <pod> -n <ns> — the
Events section shows the exact scheduler reason.WaitForFirstConsumer PVCs staying Pending is normal, not an error.WaitForFirstConsumer PVCs are zone-aware — PV in zone-a + only available node in zone-b =
both stay Pending. [src6]| # | Cause | Likelihood | Scheduler Message | Fix |
|---|---|---|---|---|
| 1 | Insufficient CPU/memory | ~35% | "Insufficient cpu/memory" | Reduce requests, add nodes [src1, src4] |
| 2 | Taints without tolerations | ~20% | "had taint pod didn't tolerate" | Add toleration or remove taint [src2, src4] |
| 3 | PVC not bound | ~15% | "unbound PersistentVolumeClaims" | Fix StorageClass, provision PV [src3, src6] |
| 4 | nodeSelector mismatch | ~12% | "didn't match node affinity/selector" | Fix selector or label nodes [src4, src7] |
| 5 | Node affinity mismatch | ~5% | "didn't match node affinity" | Fix affinity or add nodes [src4, src7] |
| 6 | Pod anti-affinity | ~4% | "didn't match anti-affinity" | Reduce replicas or add nodes [src4, src7] |
| 7 | Node cordoned | ~3% | "nodes were unschedulable" | kubectl uncordon <node> [src3,
src4] |
| 8 | ResourceQuota exceeded | ~2% | "exceeded quota" | Increase quota [src1, src3] |
| 9 | PodDisruptionBudget blocking | ~2% | "disruption budget violated" | Adjust PDB [src3] |
| 10 | Topology spread constraints | ~1% | "didn't satisfy topology spread" | Relax constraints [src7] |
| 11 | Node condition (DiskPressure/MemoryPressure) | ~1% | "node(s) had condition" | Resolve node pressure [src4, src5] |
| 12 | Scheduler not running | <1% | No events at all | Check kube-scheduler [src1] |
START — Pod stuck in Pending
├── kubectl describe pod → Check Events section
│ ├── "Insufficient cpu/memory" → Check requests vs allocatable [src1, src4]
│ │ └── K8s 1.35+? → Consider In-Place Pod Resize for running pods [src8]
│ ├── "taint pod didn't tolerate" → Add toleration or remove taint [src2]
│ ├── "unbound PersistentVolumeClaims" → Fix PVC/StorageClass [src6]
│ ├── "didn't match node affinity/selector" → Fix labels/selectors [src7]
│ ├── "nodes were unschedulable" → kubectl uncordon [src3]
│ ├── "exceeded quota" → Increase ResourceQuota [src1]
│ ├── "didn't satisfy topology spread" → Relax maxSkew or ScheduleAnyway [src7]
│ ├── "node(s) had condition" → Check DiskPressure/MemoryPressure [src4]
│ └── No events → Check kube-scheduler is running [src1]
├── Managed K8s? → Check Cluster Autoscaler events [src4]
└── kubectl get events -A --sort-by='.lastTimestamp'
Find which pods are stuck. [src1, src3]
kubectl get pods -A --field-selector=status.phase=Pending
kubectl get pods -o wide | grep Pending
The most important step — events tell you exactly why. [src1, src3, src4]
kubectl describe pod <pod> -n <ns>
kubectl get events -n <ns> --field-selector involvedObject.name=<pod>
Compare requests against allocatable. [src1, src4, src5]
kubectl top nodes
kubectl describe nodes | grep -A5 "Allocatable:"
kubectl describe node <node> | grep -A20 "Allocated resources"
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].resources.requests}'
Taints repel pods without matching tolerations. [src2, src4]
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints[*].key
kubectl get pod <pod> -o jsonpath='{.spec.tolerations}'
kubectl taint node <node> key=value:NoSchedule- # Remove taint
PVCs must bind before pods can schedule. [src3, src6]
kubectl get pvc -n <ns>
kubectl describe pvc <name> -n <ns>
kubectl get pv
kubectl get storageclass
Selectors must match actual node labels. [src4, src7]
kubectl get pod <pod> -o jsonpath='{.spec.nodeSelector}'
kubectl get nodes --show-labels
kubectl label node <node> disktype=ssd
Quotas and cordoned nodes block scheduling. [src1, src3]
kubectl get resourcequota -n <ns>
kubectl get nodes # "SchedulingDisabled" = cordoned
kubectl uncordon <node>
kubectl describe limitrange -n <ns>
#!/bin/bash
POD="$1"; NS="${2:-default}"
if [ -z "$POD" ]; then
echo "Usage: $0 <pod> [ns]"
kubectl get pods -A --field-selector=status.phase=Pending; exit 1
fi
echo "=== Pending Pod Diagnostic: $POD (ns: $NS) ==="
kubectl get pod "$POD" -n "$NS" -o wide
echo "=== Resource Requests ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{range .spec.containers[*]} {.name}: cpu={.resources.requests.cpu} mem={.resources.requests.memory}{"\n"}{end}'
echo "=== Node Selector ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{.spec.nodeSelector}'
echo "=== Tolerations ==="
kubectl get pod "$POD" -n "$NS" -o jsonpath='{range .spec.tolerations[*]} {.key}={.value}:{.effect}{"\n"}{end}'
echo "=== Node Resources ==="
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory,TAINTS:.spec.taints[*].key
echo "=== PVCs ==="
PVCS=$(kubectl get pod "$POD" -n "$NS" -o jsonpath='{.spec.volumes[*].persistentVolumeClaim.claimName}')
for PVC in $PVCS; do
kubectl get pvc "$PVC" -n "$NS" 2>/dev/null
done
echo "=== Scheduler Events ==="
kubectl get events -n "$NS" --field-selector "involvedObject.name=$POD" --sort-by='.lastTimestamp' | tail -10
# Auto-diagnosis
EVENTS=$(kubectl get events -n "$NS" --field-selector "involvedObject.name=$POD" -o jsonpath='{.items[-1].message}')
if echo "$EVENTS" | grep -qi "insufficient"; then echo "DIAGNOSIS: Insufficient resources"
elif echo "$EVENTS" | grep -qi "taint"; then echo "DIAGNOSIS: Taint/toleration mismatch"
elif echo "$EVENTS" | grep -qi "PersistentVolumeClaim"; then echo "DIAGNOSIS: PVC not bound"
elif echo "$EVENTS" | grep -qi "affinity\|selector"; then echo "DIAGNOSIS: Selector/affinity mismatch"
elif echo "$EVENTS" | grep -qi "unschedulable"; then echo "DIAGNOSIS: Node cordoned"
elif echo "$EVENTS" | grep -qi "condition"; then echo "DIAGNOSIS: Node condition issue"
elif [ -z "$EVENTS" ]; then echo "DIAGNOSIS: No events — check kube-scheduler"
fi
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: myapp:1.2.3
resources:
requests: { cpu: 250m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: "dedicated"
operator: "Equal"
value: "api"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: disktype
operator: In
values: ["ssd"]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-server
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api-server
#!/usr/bin/env python3
import subprocess, json
def run_kubectl(cmd):
result = subprocess.run(f"kubectl {cmd} -o json".split(), capture_output=True, text=True)
return json.loads(result.stdout) if result.returncode == 0 else None
def parse_resource(value):
if not value: return 0
value = str(value)
if value.endswith("m"): return int(value[:-1])
elif value.endswith("Mi"): return int(value[:-2]) * 1024 * 1024
elif value.endswith("Gi"): return int(value[:-2]) * 1024**3
else:
try: return int(float(value) * 1000)
except: return 0
def audit():
nodes = run_kubectl("get nodes")
pods = run_kubectl("get pods -A")
if not nodes or not pods: print("Cannot access cluster"); return
node_req = {}
for pod in pods.get("items", []):
node = pod.get("spec", {}).get("nodeName")
if not node or pod["status"].get("phase") != "Running": continue
if node not in node_req: node_req[node] = {"cpu": 0, "mem": 0}
for c in pod["spec"].get("containers", []):
r = c.get("resources", {}).get("requests", {})
node_req[node]["cpu"] += parse_resource(r.get("cpu", "0"))
node_req[node]["mem"] += parse_resource(r.get("memory", "0"))
for node in nodes["items"]:
name = node["metadata"]["name"]
alloc = node["status"]["allocatable"]
taints = [t["key"] for t in node["spec"].get("taints", [])]
req = node_req.get(name, {"cpu": 0, "mem": 0})
print(f"{name}: CPU free={parse_resource(alloc['cpu'])-req['cpu']}m "
f"Mem free={((parse_resource(alloc['memory'])-req['mem'])/1024**2):.0f}Mi "
f"Taints={taints or 'none'}")
if __name__ == "__main__":
audit()
# BAD — no node has 64Gi allocatable [src1, src4]
resources:
requests:
cpu: "16"
memory: 64Gi
# GOOD — fits within typical node [src1, src4]
resources:
requests: { cpu: 500m, memory: 512Mi }
limits: { cpu: "1", memory: 1Gi }
# BAD — label doesn't exist on any node [src4, src7]
nodeSelector:
gpu-type: a100
# GOOD — check labels, then set selector [src4, src7]
kubectl get nodes --show-labels | grep gpu-type
kubectl label node worker-1 gpu-type=a100
# BAD — all nodes tainted, no toleration [src2, src4]
spec:
containers:
- name: app
image: myapp
# No tolerations!
# GOOD — toleration matches taint [src2, src4]
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "special"
effect: "NoSchedule"
# BAD — 5 replicas with required anti-affinity on 3-node cluster [src7]
spec:
replicas: 5
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
# 2 pods will be Pending forever
# GOOD — preferred anti-affinity allows co-location as fallback [src7]
spec:
replicas: 5
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
kubectl top data. [src1, src4]NoSchedule by default. On
single-node clusters, remove the taint. [src2, src4]kubectl describe node. [src4, src5]
kubectl describe quota. [src1, src3]
kubectl describe node. [src4, src5]
# === Find Pending pods ===
kubectl get pods -A --field-selector=status.phase=Pending
# === Pod details ===
kubectl describe pod <pod> -n <ns>
# === Node resources ===
kubectl top nodes
kubectl describe nodes | grep -A5 "Allocatable:"
# === Taints ===
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints[*].key
# === Labels ===
kubectl get nodes --show-labels
# === PVC ===
kubectl get pvc -n <ns>
kubectl get pv
kubectl get storageclass
# === Quotas ===
kubectl get resourcequota -n <ns>
kubectl describe limitrange -n <ns>
# === Schedulability ===
kubectl get nodes # SchedulingDisabled = cordoned
kubectl uncordon <node>
# === Node conditions ===
kubectl describe node <node> | grep -A5 "Conditions:"
# === Cluster Autoscaler (managed K8s) ===
kubectl get events -n kube-system | grep cluster-autoscaler
| Version | Behavior | Key Changes |
|---|---|---|
| K8s 1.35 (2025-12) | Current | In-Place Pod Resize GA; gang scheduling alpha; mutable PV node affinity [src8] |
| K8s 1.34 (2025-08) | Stable | Async scheduler API; nominatedNodeName for more pods [src1] |
| K8s 1.32 (2025-04) | Stable | QueueingHint beta (faster Pending pod requeue) [src1] |
| K8s 1.29+ (2024) | Stable | Improved scheduling hints; sidecar containers GA [src1] |
| K8s 1.27 (2023) | Stable | In-place resource resize alpha [src8] |
| K8s 1.24 (2022) | Stable | Non-graceful node shutdown; PV topology [src6] |
| K8s 1.19 (2020) | TopologySpread GA | PodTopologySpreadConstraints GA [src7] |
| K8s 1.18 (2020) | WaitForFirstConsumer | More StorageClasses default to delayed binding [src6] |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Pod shows Pending status | Pod shows CrashLoopBackOff | Debug container crash (logs, exit code) |
| Events mention scheduling failures | Pod shows ContainerCreating | Wait; or debug image pull / volume |
| No node selected for pod | Pod Running but not Ready | Debug readiness probe |
| PVC stuck in Pending | Pod Evicted | Check node pressure conditions |
| Cluster Autoscaler not scaling up | Pod OOMKilled | Debug memory limits (not scheduling) |
NotTriggerScaleUp events from Cluster Autoscaler. [src4]WaitForFirstConsumer PVCs are zone-aware. PV in zone-a + node in zone-b = both stay
Pending. [src6]ResourceQuota and LimitRange — they silently inject defaults or block
pods. [src1, src3]
PriorityClass objects. In K8s
1.35, In-Place Pod Resize also respects priority for deferred resizes. [src1, src8]