desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).kubectl autoscale deployment <name> --cpu-percent=50 --min=2 --max=10| # | Metric Type | API Group | Source | Target Types | Example |
|---|---|---|---|---|---|
| 1 | Resource | metrics.k8s.io | metrics-server | Utilization, AverageValue | CPU at 50% utilization |
| 2 | ContainerResource | metrics.k8s.io | metrics-server | Utilization, AverageValue | CPU of specific container |
| 3 | Pods | custom.metrics.k8s.io | prometheus-adapter | AverageValue | requests_per_second per pod |
| 4 | Object | custom.metrics.k8s.io | prometheus-adapter | Value, AverageValue | Ingress requests_per_second |
| 5 | External | external.metrics.k8s.io | cloud provider adapter | Value, AverageValue | SQS queue depth from CloudWatch |
| Parameter | Default (Scale Up) | Default (Scale Down) | Description |
|---|---|---|---|
| stabilizationWindowSeconds | 0 | 300 (5 min) | Window to pick highest/lowest recommendation |
| policies[].type | Percent / Pods | Percent / Pods | Scale by percentage or absolute pod count |
| policies[].value | 100% or 4 pods | 100% | Maximum change per period |
| policies[].periodSeconds | 15 | 60 | How often policy can trigger |
| selectPolicy | Max | Max | Pick largest (Max) or smallest (Min) of policies |
| Parameter | Default | Flag | Description |
|---|---|---|---|
| Sync period | 15s | --horizontal-pod-autoscaler-sync-period | How often HPA evaluates metrics |
| Tolerance | 0.1 (10%) | --horizontal-pod-autoscaler-tolerance | Ratio must differ from 1.0 by this much to trigger scaling |
| Downscale stabilization | 300s | Via behavior.scaleDown.stabilizationWindowSeconds | Prevents rapid scale-down oscillation |
| Initial readiness delay | 30s | --horizontal-pod-autoscaler-initial-readiness-delay | Ignore unready pods for this duration |
| CPU init period | 5m | --horizontal-pod-autoscaler-cpu-initialization-period | Ignore CPU metrics during startup |
START: What type of autoscaling do you need?
|
+-- Need to scale replica count based on CPU/memory?
| +-- YES --> Use HPA with Resource metrics (see Step 1-2)
| +-- NO |
|
+-- Need to scale on application-specific metrics (RPS, queue length)?
| +-- YES --> Is the metric available inside the cluster (Prometheus)?
| | +-- YES --> Use HPA + prometheus-adapter (see Step 3)
| | +-- NO --> Use HPA + external metrics adapter (see Step 4)
| +-- NO |
|
+-- Need to scale to zero replicas when idle?
| +-- YES --> Use KEDA (extends HPA with scale-to-zero)
| +-- NO |
|
+-- Need to right-size pod resource requests, not add replicas?
| +-- YES --> Use VPA (Vertical Pod Autoscaler), not HPA
| +-- NO |
|
+-- Workload is stateful (database, cache)?
| +-- YES --> Prefer VPA or manual scaling; HPA risks split-brain
| +-- NO |
|
+-- Need cluster node scaling when pods are unschedulable?
| +-- YES --> Use Cluster Autoscaler alongside HPA
| +-- NO |
|
+-- DEFAULT --> Start with HPA on CPU at 50-70% target
metrics-server collects CPU and memory usage from kubelets and exposes them via the metrics.k8s.io API. Required for any resource-based HPA. [src1]
# Install metrics-server via kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# For local clusters (minikube, kind) -- disable TLS verification
kubectl patch deployment metrics-server -n kube-system \
--type='json' \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
Verify: kubectl top nodes -- should show CPU and memory usage for each node.
Deploy an HPA targeting a Deployment at 50% average CPU utilization. [src2]
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Verify: kubectl get hpa my-app-hpa -- TARGETS column should show current/target percentages.
Install prometheus-adapter to expose Prometheus metrics via the custom.metrics.k8s.io API. [src3]
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc \
--set prometheus.port=9090
Verify: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . -- should list custom metrics.
Fine-tune scale-up aggressiveness and scale-down conservatism to prevent flapping. [src5]
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 60
- type: Pods
value: 4
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 120
selectPolicy: Min
Verify: kubectl describe hpa my-app-hpa -- check Events section for scaling decisions.
Scale on application-specific metrics like requests per second after setting up prometheus-adapter. [src3]
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Verify: kubectl get hpa my-app-hpa-custom -- both metrics should show current values.
When you need scale-to-zero or 60+ external event source integrations beyond what HPA offers natively. [src4]
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
Verify: kubectl get scaledobject -- READY should be True.
# Input: Deployment "api-server" running in production
# Output: HPA scaling on CPU, memory, and AWS SQS queue depth
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: External
external:
metric:
name: sqs_queue_messages_visible
selector:
matchLabels:
queue: api-tasks
target:
type: AverageValue
averageValue: "20"
# Input: Kubernetes cluster with kubeconfig configured
# Output: Creates/updates an HPA programmatically
from kubernetes import client, config # kubernetes==29.0.0
config.load_kube_config()
autoscaling_v2 = client.AutoscalingV2Api()
hpa = client.V2HorizontalPodAutoscaler(
metadata=client.V1ObjectMeta(name="my-app-hpa"),
spec=client.V2HorizontalPodAutoscalerSpec(
scale_target_ref=client.V2CrossVersionObjectReference(
api_version="apps/v1", kind="Deployment", name="my-app"),
min_replicas=2, max_replicas=20,
metrics=[client.V2MetricSpec(type="Resource",
resource=client.V2ResourceMetricSource(name="cpu",
target=client.V2MetricTarget(
type="Utilization", average_utilization=50)))]))
# Input: RabbitMQ queue with messages to process
# Output: Scales consumer from 0 to 30 based on queue depth
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-consumer
spec:
scaleTargetRef:
name: consumer-deployment
minReplicaCount: 0
maxReplicaCount: 30
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: rabbitmq
metadata:
host: amqp://guest:[email protected]:5672/
queueName: task-queue
queueLength: "5"
# BAD -- HPA cannot calculate utilization without resource requests
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
# No resources.requests defined!
# Result: HPA shows <unknown>/50% and never scales
# GOOD -- resource requests let HPA calculate utilization
containers:
- name: app
image: my-app:latest
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# BAD -- v1 only supports CPU, no scaling behavior
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
targetCPUUtilizationPercentage: 50
# GOOD -- v2 with scaling behavior prevents flapping
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 300
# BAD -- HPA and VPA conflict on the same resource metric
# HPA wants more pods; VPA wants to resize same pods
# Result: oscillation, pod restarts, unpredictable behavior
# GOOD -- VPA manages resource requests; HPA scales on custom metric
# VPA: controlledResources: ["cpu", "memory"]
# HPA: metric type Pods, name: http_requests_per_second
behavior.scaleDown.stabilizationWindowSeconds: 300 and conservative scale-down policies. [src5]requests.cpu: 50m but app needs 200m idle, HPA sees 400% utilization and over-provisions. Fix: Use VPA recommendations to set accurate requests first. [src6]FailedGetPodsMetric error. Fix: Verify prometheus-adapter is running and metrics are exposed via kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1". [src3]minReplicas: 2 minimum for production workloads. [src6]# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system
# View current resource usage for pods
kubectl top pods -n <namespace>
# View HPA status (current metrics vs targets)
kubectl get hpa -n <namespace>
# Detailed HPA status with events and conditions
kubectl describe hpa <hpa-name> -n <namespace>
# Verify custom metrics API is registered
kubectl get apiservice | grep metrics
# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'
# Watch HPA scaling events in real time
kubectl get events --field-selector reason=SuccessfulRescale --watch
# Check KEDA ScaledObject status
kubectl get scaledobject -n <namespace>
| API Version | K8s Version | Status | Key Changes |
|---|---|---|---|
| autoscaling/v2 | 1.23+ (GA) | Current, recommended | Multiple metrics, custom/external, scaling behavior, container resources |
| autoscaling/v2beta2 | 1.12-1.25 | Removed in 1.26 | Identical to v2; migrate to autoscaling/v2 |
| autoscaling/v2beta1 | 1.8-1.25 | Removed in 1.26 | Early multi-metric support |
| autoscaling/v1 | 1.2+ | Supported but limited | CPU-only, no scaling behavior, no custom metrics |
| Container resources | 1.20 (beta), 1.27 (GA) | Current | Per-container metric targeting |
| KEDA 2.x | Any | CNCF graduated | 60+ scalers, scale-to-zero, ScaledObject/ScaledJob |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Stateless web services need to handle variable HTTP traffic | Workload is stateful with complex state management (databases) | Manual scaling or operator-based scaling |
| CPU/memory utilization correlates with load | Load pattern is purely event-driven (queue consumers, cron jobs) | KEDA with event-source triggers |
| Need to scale replicas 1-N based on resource metrics | Need to scale from 0 to N (cold-start workloads) | KEDA (supports scale-to-zero) |
| Prometheus already collects application metrics in-cluster | Need to right-size pod resource requests/limits | Vertical Pod Autoscaler (VPA) |
| Multiple metrics should drive scaling decisions | Single-event trigger with complex business logic | Custom controller or KEDA with multiple triggers |