Kubernetes Security Checklist: Cluster Hardening Guide
What is the Kubernetes security checklist?
TL;DR
- Bottom line: Kubernetes security requires defense-in-depth across 8 domains: RBAC, network policies, pod security standards, secrets encryption, image scanning, API server hardening, etcd protection, and audit logging -- no single control is sufficient.
- Key tool/command:
kubectl auth can-i --list --as=system:serviceaccount:ns:sato audit RBAC permissions;kube-bench runto check CIS Benchmark compliance. - Watch out for: Default Kubernetes allows all pod-to-pod communication and runs containers as root -- you must explicitly apply NetworkPolicies and Pod Security Standards to every namespace.
- Works with: Kubernetes 1.28+ (Pod Security Admission GA). CIS Benchmark covers 1.29-1.31. NSA/CISA guide applies to all versions. Falco 0.37+.
Constraints
- NEVER run workloads as root unless absolutely required -- enforce runAsNonRoot: true and drop ALL capabilities
- NEVER leave etcd unencrypted or accessible without mTLS -- write access to etcd equals root on the entire cluster
- NEVER use the system:masters group for regular authentication -- reserve for break-glass emergency access only
- ALWAYS apply default-deny NetworkPolicies in every namespace before deploying workloads
- NEVER store secrets in plaintext in manifests or environment variables -- use encryption at rest and external secret managers
- ALWAYS run the latest supported Kubernetes minor version -- only the 3 most recent minors receive security patches
Quick Reference
| # | Security Domain | Risk Level | Key Control | Verification Command |
|---|---|---|---|---|
| 1 | RBAC | Critical | Least-privilege Roles; no system:masters for users | kubectl auth can-i --list --as=<user> |
| 2 | Network Policies | Critical | Default-deny ingress/egress in every namespace | kubectl get networkpolicy -A |
| 3 | Pod Security Standards | Critical | Enforce restricted or baseline per namespace | kubectl label ns <ns> pod-security.kubernetes.io/enforce=restricted |
| 4 | Secrets Management | Critical | Encryption at rest; external secret manager (Vault, AWS SM) | kubectl get secrets -o yaml (check no plaintext) |
| 5 | Image Security | High | Scan images in CI; allow only signed/trusted images | trivy image <image:tag> |
| 6 | API Server | Critical | Disable anonymous auth; enable audit logging; use OIDC | curl -k https://<api>:6443/healthz (should require auth) |
| 7 | etcd | Critical | mTLS; isolate behind firewall; encrypt at rest | etcdctl endpoint health --cluster |
| 8 | Audit Logging | High | Enable audit policy; protect log storage | --audit-policy-file flag on kube-apiserver |
| 9 | Runtime Security | High | Deploy Falco or equivalent; detect anomalous syscalls | kubectl logs -n falco -l app=falco |
| 10 | Resource Limits | Medium | Set CPU/memory requests and limits on all pods | kubectl describe resourcequota -A |
| 11 | Node Security | High | Seccomp + AppArmor/SELinux; isolate sensitive workloads | kubectl get pod -o jsonpath='{.spec.securityContext}' |
| 12 | Service Mesh / mTLS | Medium | Encrypt all in-cluster traffic with mutual TLS | istioctl analyze or mesh-specific check |
Decision Tree
START: What is your primary Kubernetes security concern?
├── Setting up a new cluster from scratch?
│ ├── YES → Follow full checklist: Steps 1-8 below
│ └── NO ↓
├── Hardening an existing production cluster?
│ ├── YES → Run kube-bench first (Step 1), then fix gaps by priority
│ └── NO ↓
├── Investigating a specific security domain?
│ ├── RBAC → Step 2 (RBAC) + Anti-Pattern #1
│ ├── Network → Step 3 (NetworkPolicies)
│ ├── Pod Security → Step 4 (Pod Security Standards)
│ ├── Secrets → Step 5 (Secrets Management)
│ ├── Images → Step 6 (Image Security)
│ ├── Runtime → Step 7 (Runtime Monitoring)
│ └── Audit → Step 8 (Audit Logging)
├── Preparing for compliance audit (CIS/SOC2/PCI)?
│ ├── YES → Run kube-bench + review CIS Benchmark sections
│ └── NO ↓
└── DEFAULT → Start with kube-bench scan, address Critical findings first
Step-by-Step Guide
1. Run CIS Benchmark assessment with kube-bench
Establish a security baseline by running kube-bench against the CIS Kubernetes Benchmark. [src4]
# Run kube-bench as a Kubernetes Job
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs job/kube-bench
Verify: Review output for [FAIL] items -- these are CIS Benchmark violations. Expected: zero FAIL items for a hardened cluster.
2. Configure least-privilege RBAC
Implement RBAC with the principle of least privilege. Use namespace-scoped Roles wherever possible. [src1]
# Role: Read-only access to pods in a specific namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: production
name: read-pods-binding
subjects:
- kind: Group
name: "developers"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
Verify: kubectl auth can-i create pods --as=developer1 -n production → expected: no
3. Apply default-deny NetworkPolicies
Default Kubernetes allows all pod-to-pod communication. Apply default-deny in every namespace. [src2]
# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Verify: kubectl get networkpolicy -n production → should list default-deny-all.
4. Enforce Pod Security Standards
Apply Pod Security Standards at the namespace level. Use restricted for security-critical workloads. [src5]
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Verify: kubectl run test --image=nginx --privileged -n production → expected: rejected by Pod Security Admission.
5. Secure secrets with encryption at rest
Enable Kubernetes secret encryption at rest and use external secret managers for production. [src3]
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}
Verify: Check etcd data with hexdump -- should show encryption prefix, not plaintext.
6. Implement image scanning in CI/CD
Scan container images for vulnerabilities before deployment. Block images with critical CVEs. [src2]
# Scan with Trivy -- exit code 1 fails CI build
trivy image --severity CRITICAL,HIGH --exit-code 1 myregistry/myapp:latest
Verify: trivy image myapp:latest --severity CRITICAL → expected: 0 critical vulnerabilities.
7. Deploy runtime security monitoring (Falco)
Install Falco to detect anomalous runtime behavior: shell execution, unexpected file access, privilege escalation. [src8]
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
--namespace falco --create-namespace \
--set falcosidekick.enabled=true
Verify: kubectl logs -n falco -l app.kubernetes.io/name=falco → should show Falco running.
8. Enable Kubernetes audit logging
Configure API server audit logging to track all authentication, authorization, and resource operations. [src1]
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
resources:
- group: ""
resources: ["secrets", "configmaps"]
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- level: Metadata
omitStages:
- RequestReceived
Verify: cat /var/log/kubernetes/audit.log | jq '.verb' | head → should show recent API operations.
Code Examples
YAML: Complete RBAC Configuration for a Microservice
# Input: Microservice needing ConfigMap and Secret access in its namespace
# Output: Least-privilege RBAC with separate service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: order-service
namespace: orders
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: orders
name: order-service-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["order-config"]
verbs: ["get", "watch"]
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["order-db-creds"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: orders
name: order-service-binding
subjects:
- kind: ServiceAccount
name: order-service
namespace: orders
roleRef:
kind: Role
name: order-service-role
apiGroup: rbac.authorization.k8s.io
YAML: Comprehensive NetworkPolicy for Multi-Tier App
# Input: Three-tier app (frontend, backend, database)
# Output: NetworkPolicies allowing only required paths
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: production
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
tier: database
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
YAML: Falco Custom Runtime Security Rules
# Input: Kubernetes cluster with security monitoring
# Output: Falco rules detecting common attack patterns
- rule: Shell Spawned in Container
desc: Detect shell execution inside a container
condition: >
spawned_process and container and
proc.name in (bash, sh, zsh, csh, ksh)
output: >
Shell spawned (user=%user.name container=%container.name
image=%container.image.repository pod=%k8s.pod.name
ns=%k8s.ns.name shell=%proc.name)
priority: WARNING
tags: [container, shell, mitre_execution]
- rule: Read Sensitive File in Container
desc: Detect reading of sensitive files
condition: >
open_read and container and
(fd.name startswith /etc/shadow or
fd.name startswith /root/.ssh)
output: >
Sensitive file read (file=%fd.name container=%container.name
pod=%k8s.pod.name)
priority: ERROR
tags: [container, filesystem, mitre_credential_access]
Bash: Automated Security Audit Script
#!/usr/bin/env bash
# Input: kubectl access to a Kubernetes cluster
# Output: Security posture report
set -euo pipefail
echo "=== Kubernetes Security Audit ==="
echo "Cluster: $(kubectl config current-context)"
# Check pods running as root
echo "--- Pods Running as Root ---"
kubectl get pods -A -o json | \
jq -r '.items[] | select(.spec.securityContext.runAsNonRoot != true) |
"\(.metadata.namespace)/\(.metadata.name)"' | head -20
# Namespaces without PSS enforcement
echo "--- Namespaces Without PSS ---"
kubectl get ns -o json | \
jq -r '.items[] | select(.metadata.labels["pod-security.kubernetes.io/enforce"] == null) |
.metadata.name' | grep -v "^kube-"
echo "=== Audit Complete ==="
Anti-Patterns
Wrong: Granting cluster-admin to application service accounts
# BAD -- cluster-admin gives full control over the entire cluster
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: app-admin
subjects:
- kind: ServiceAccount
name: my-app
namespace: default
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
Correct: Scoped Role with minimum required permissions
# GOOD -- namespace-scoped Role with only needed permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: my-app-ns
name: my-app-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["my-app-config"]
verbs: ["get", "watch"]
Wrong: No NetworkPolicy (default allow-all)
# BAD -- no NetworkPolicy means any pod can reach any other pod
# This is the default Kubernetes behavior
# (No manifest needed -- this is what you get by doing nothing)
Correct: Default-deny with explicit allow rules
# GOOD -- default deny everything, then allow specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Wrong: Running containers as root with all capabilities
# BAD -- running as root with privilege escalation enabled
apiVersion: v1
kind: Pod
metadata:
name: insecure-pod
spec:
containers:
- name: app
image: myapp:latest
securityContext:
privileged: true
runAsUser: 0
Correct: Non-root with dropped capabilities and read-only filesystem
# GOOD -- restricted security context
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:1.0.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
Wrong: Storing secrets in plaintext environment variables
# BAD -- plaintext secrets in manifest, exposed via env vars
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
env:
- name: DB_PASSWORD
value: "super-secret-password"
Correct: External secret manager with volume mount
# GOOD -- secrets managed externally, mounted as read-only files
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
volumeMounts:
- name: db-creds
mountPath: /etc/secrets
readOnly: true
volumes:
- name: db-creds
secret:
secretName: db-credentials
Common Pitfalls
- Auto-mounted service account tokens: Every pod gets a service account token by default, expanding blast radius if compromised. Fix: Set
automountServiceAccountToken: falseon ServiceAccounts and Pods. [src1] - Wildcard RBAC rules: Using
resources: ["*"]orverbs: ["*"]grants far more access than intended. Fix: Enumerate specific resources and verbs; useresourceNamesto restrict to specific objects. [src2] - Missing egress NetworkPolicies: Teams apply ingress rules but forget egress, allowing compromised pods to exfiltrate data. Fix: Apply both ingress AND egress default-deny, explicitly allow DNS (UDP 53). [src3]
- Pod Security Standards only in warn mode: Setting
pod-security.kubernetes.io/warnwithoutenforcelogs violations but does not block non-compliant pods. Fix: Useenforcemode. [src5] - etcd accessible without mTLS: Exposing etcd without mutual TLS allows anyone with network access to read all cluster data. Fix: Configure
--peer-client-cert-auth=trueon etcd. [src2] - Stale RBAC bindings for departed team members: Users who leave retain cluster access. Fix: Implement periodic access reviews; use OIDC with IdP for automatic deprovisioning. [src1]
- Cloud metadata API exposure: Pods can access 169.254.169.254 to steal instance credentials. Fix: Block metadata API via NetworkPolicy egress rules. [src3]
- Running kube-bench without acting on results: Teams scan but fail to remediate findings. Fix: Integrate kube-bench into CI/CD with threshold-based gating. [src4]
Diagnostic Commands
# Run CIS Benchmark check
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs job/kube-bench
# Audit RBAC permissions for a service account
kubectl auth can-i --list --as=system:serviceaccount:production:my-app -n production
# Check PSS enforcement across namespaces
kubectl get ns -o json | jq -r '.items[] | "\(.metadata.name): \(.metadata.labels["pod-security.kubernetes.io/enforce"] // "NONE")"'
# List all NetworkPolicies
kubectl get networkpolicy -A
# Check for pods running as root
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.securityContext.runAsNonRoot != true) | "\(.metadata.namespace)/\(.metadata.name)"'
# Scan cluster images for vulnerabilities
trivy k8s --report summary cluster
# Check Falco runtime alerts
kubectl logs -n falco -l app.kubernetes.io/name=falco --tail=50
# List all ClusterRoleBindings to cluster-admin
kubectl get clusterrolebindings -o json | jq -r '.items[] | select(.roleRef.name == "cluster-admin") | "\(.metadata.name): \(.subjects[].name)"'
Version History & Compatibility
| Kubernetes Version | Status | Key Security Features |
|---|---|---|
| 1.34-1.35 (2025) | Current | User namespaces GA; finer-grained authorization; pod mTLS certificates (beta) |
| 1.30-1.33 (2024-2025) | Supported | Validating Admission Policy GA; bound SA token improvements; recursive read-only mounts |
| 1.28-1.29 (2023-2024) | LTS | Pod Security Admission GA; seccomp RuntimeDefault GA |
| 1.25 (2022) | EOL | PodSecurityPolicy removed -- must migrate to Pod Security Standards |
| Tool | Version | Purpose |
|---|---|---|
| kube-bench | 0.8.x | CIS Benchmark compliance scanning |
| Falco | 0.37+ | Runtime threat detection |
| Trivy | 0.50+ | Image + cluster vulnerability scanning |
| OPA/Gatekeeper | 3.16+ | Policy enforcement via admission controller |
| Kyverno | 1.12+ | Kubernetes-native policy management |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Deploying any production Kubernetes cluster | Running local dev cluster (minikube, kind) with no sensitive data | Basic kubectl access controls only |
| Preparing for SOC 2, PCI DSS, or HIPAA compliance | Using serverless containers (Fargate, Cloud Run) without K8s | Cloud provider's native security controls |
| Multi-tenant clusters sharing infrastructure | Single-tenant cluster with full trust between all workloads | Simplified RBAC + network policies may suffice |
| Clusters exposed to the internet (ingress controllers) | Air-gapped clusters with no external access | Focus on internal controls |
Important Caveats
- CIS Benchmark recommendations vary between self-managed clusters (kubeadm) and managed services (EKS, GKE, AKS) -- many control plane settings are not configurable on managed platforms
- NetworkPolicies require a CNI plugin that supports them (Calico, Cilium, Weave Net) -- the default kubenet CNI does NOT enforce NetworkPolicies
- Pod Security Standards enforcement only prevents creation of non-compliant pods -- it does not retroactively terminate existing non-compliant pods
- Kubernetes secrets are only base64-encoded by default, NOT encrypted -- encryption at rest must be explicitly configured via EncryptionConfiguration
- Falco and similar runtime tools add CPU overhead (~1-5%) on each node -- plan capacity accordingly
- RBAC authorization cannot be granular at the individual Pod level -- authorization applies to all Pods of a resource type within the namespace scope