Kubernetes Horizontal Pod Autoscaler (HPA): Complete Reference

Type: Software Reference Confidence: 0.94 Sources: 7 Verified: 2026-02-27 Freshness: 2026-02-27

TL;DR

Constraints

Quick Reference

HPA Metric Types

#Metric TypeAPI GroupSourceTarget TypesExample
1Resourcemetrics.k8s.iometrics-serverUtilization, AverageValueCPU at 50% utilization
2ContainerResourcemetrics.k8s.iometrics-serverUtilization, AverageValueCPU of specific container
3Podscustom.metrics.k8s.ioprometheus-adapterAverageValuerequests_per_second per pod
4Objectcustom.metrics.k8s.ioprometheus-adapterValue, AverageValueIngress requests_per_second
5Externalexternal.metrics.k8s.iocloud provider adapterValue, AverageValueSQS queue depth from CloudWatch

Scaling Behavior Configuration

ParameterDefault (Scale Up)Default (Scale Down)Description
stabilizationWindowSeconds0300 (5 min)Window to pick highest/lowest recommendation
policies[].typePercent / PodsPercent / PodsScale by percentage or absolute pod count
policies[].value100% or 4 pods100%Maximum change per period
policies[].periodSeconds1560How often policy can trigger
selectPolicyMaxMaxPick largest (Max) or smallest (Min) of policies

HPA Algorithm Parameters

ParameterDefaultFlagDescription
Sync period15s--horizontal-pod-autoscaler-sync-periodHow often HPA evaluates metrics
Tolerance0.1 (10%)--horizontal-pod-autoscaler-toleranceRatio must differ from 1.0 by this much to trigger scaling
Downscale stabilization300sVia behavior.scaleDown.stabilizationWindowSecondsPrevents rapid scale-down oscillation
Initial readiness delay30s--horizontal-pod-autoscaler-initial-readiness-delayIgnore unready pods for this duration
CPU init period5m--horizontal-pod-autoscaler-cpu-initialization-periodIgnore CPU metrics during startup

Decision Tree

START: What type of autoscaling do you need?
|
+-- Need to scale replica count based on CPU/memory?
|   +-- YES --> Use HPA with Resource metrics (see Step 1-2)
|   +-- NO |
|
+-- Need to scale on application-specific metrics (RPS, queue length)?
|   +-- YES --> Is the metric available inside the cluster (Prometheus)?
|   |   +-- YES --> Use HPA + prometheus-adapter (see Step 3)
|   |   +-- NO --> Use HPA + external metrics adapter (see Step 4)
|   +-- NO |
|
+-- Need to scale to zero replicas when idle?
|   +-- YES --> Use KEDA (extends HPA with scale-to-zero)
|   +-- NO |
|
+-- Need to right-size pod resource requests, not add replicas?
|   +-- YES --> Use VPA (Vertical Pod Autoscaler), not HPA
|   +-- NO |
|
+-- Workload is stateful (database, cache)?
|   +-- YES --> Prefer VPA or manual scaling; HPA risks split-brain
|   +-- NO |
|
+-- Need cluster node scaling when pods are unschedulable?
|   +-- YES --> Use Cluster Autoscaler alongside HPA
|   +-- NO |
|
+-- DEFAULT --> Start with HPA on CPU at 50-70% target

Step-by-Step Guide

1. Install metrics-server

metrics-server collects CPU and memory usage from kubelets and exposes them via the metrics.k8s.io API. Required for any resource-based HPA. [src1]

# Install metrics-server via kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# For local clusters (minikube, kind) -- disable TLS verification
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

Verify: kubectl top nodes -- should show CPU and memory usage for each node.

2. Create a basic HPA with CPU target

Deploy an HPA targeting a Deployment at 50% average CPU utilization. [src2]

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Verify: kubectl get hpa my-app-hpa -- TARGETS column should show current/target percentages.

3. Set up custom metrics with prometheus-adapter

Install prometheus-adapter to expose Prometheus metrics via the custom.metrics.k8s.io API. [src3]

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc \
  --set prometheus.port=9090

Verify: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . -- should list custom metrics.

4. Configure scaling behavior (stabilization)

Fine-tune scale-up aggressiveness and scale-down conservatism to prevent flapping. [src5]

behavior:
  scaleUp:
    stabilizationWindowSeconds: 30
    policies:
    - type: Percent
      value: 100
      periodSeconds: 60
    - type: Pods
      value: 4
      periodSeconds: 60
    selectPolicy: Max
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10
      periodSeconds: 120
    selectPolicy: Min

Verify: kubectl describe hpa my-app-hpa -- check Events section for scaling decisions.

5. Create HPA with custom metrics

Scale on application-specific metrics like requests per second after setting up prometheus-adapter. [src3]

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

Verify: kubectl get hpa my-app-hpa-custom -- both metrics should show current values.

6. Use KEDA for event-driven autoscaling

When you need scale-to-zero or 60+ external event source integrations beyond what HPA offers natively. [src4]

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Verify: kubectl get scaledobject -- READY should be True.

Code Examples

YAML: Multi-Metric HPA with External Metrics

# Input:  Deployment "api-server" running in production
# Output: HPA scaling on CPU, memory, and AWS SQS queue depth

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: External
    external:
      metric:
        name: sqs_queue_messages_visible
        selector:
          matchLabels:
            queue: api-tasks
      target:
        type: AverageValue
        averageValue: "20"

Python: Kubernetes Client HPA Management

# Input:  Kubernetes cluster with kubeconfig configured
# Output: Creates/updates an HPA programmatically

from kubernetes import client, config  # kubernetes==29.0.0
config.load_kube_config()
autoscaling_v2 = client.AutoscalingV2Api()

hpa = client.V2HorizontalPodAutoscaler(
    metadata=client.V1ObjectMeta(name="my-app-hpa"),
    spec=client.V2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2CrossVersionObjectReference(
            api_version="apps/v1", kind="Deployment", name="my-app"),
        min_replicas=2, max_replicas=20,
        metrics=[client.V2MetricSpec(type="Resource",
            resource=client.V2ResourceMetricSource(name="cpu",
                target=client.V2MetricTarget(
                    type="Utilization", average_utilization=50)))]))

YAML: KEDA ScaledObject for Queue-Based Scaling

# Input:  RabbitMQ queue with messages to process
# Output: Scales consumer from 0 to 30 based on queue depth

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-consumer
spec:
  scaleTargetRef:
    name: consumer-deployment
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://guest:[email protected]:5672/
      queueName: task-queue
      queueLength: "5"

Anti-Patterns

Wrong: HPA without resource requests

# BAD -- HPA cannot calculate utilization without resource requests
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: my-app:latest
        # No resources.requests defined!
# Result: HPA shows <unknown>/50% and never scales

Correct: Always define resource requests

# GOOD -- resource requests let HPA calculate utilization
containers:
- name: app
  image: my-app:latest
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

Wrong: Using autoscaling/v1 for production

# BAD -- v1 only supports CPU, no scaling behavior
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
  targetCPUUtilizationPercentage: 50

Correct: Use autoscaling/v2 with behavior controls

# GOOD -- v2 with scaling behavior prevents flapping
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Wrong: HPA and VPA both targeting CPU on same pods

# BAD -- HPA and VPA conflict on the same resource metric
# HPA wants more pods; VPA wants to resize same pods
# Result: oscillation, pod restarts, unpredictable behavior

Correct: VPA for sizing, HPA on custom metric

# GOOD -- VPA manages resource requests; HPA scales on custom metric
# VPA: controlledResources: ["cpu", "memory"]
# HPA: metric type Pods, name: http_requests_per_second

Common Pitfalls

Diagnostic Commands

# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system

# View current resource usage for pods
kubectl top pods -n <namespace>

# View HPA status (current metrics vs targets)
kubectl get hpa -n <namespace>

# Detailed HPA status with events and conditions
kubectl describe hpa <hpa-name> -n <namespace>

# Verify custom metrics API is registered
kubectl get apiservice | grep metrics

# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'

# Watch HPA scaling events in real time
kubectl get events --field-selector reason=SuccessfulRescale --watch

# Check KEDA ScaledObject status
kubectl get scaledobject -n <namespace>

Version History & Compatibility

API VersionK8s VersionStatusKey Changes
autoscaling/v21.23+ (GA)Current, recommendedMultiple metrics, custom/external, scaling behavior, container resources
autoscaling/v2beta21.12-1.25Removed in 1.26Identical to v2; migrate to autoscaling/v2
autoscaling/v2beta11.8-1.25Removed in 1.26Early multi-metric support
autoscaling/v11.2+Supported but limitedCPU-only, no scaling behavior, no custom metrics
Container resources1.20 (beta), 1.27 (GA)CurrentPer-container metric targeting
KEDA 2.xAnyCNCF graduated60+ scalers, scale-to-zero, ScaledObject/ScaledJob

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Stateless web services need to handle variable HTTP trafficWorkload is stateful with complex state management (databases)Manual scaling or operator-based scaling
CPU/memory utilization correlates with loadLoad pattern is purely event-driven (queue consumers, cron jobs)KEDA with event-source triggers
Need to scale replicas 1-N based on resource metricsNeed to scale from 0 to N (cold-start workloads)KEDA (supports scale-to-zero)
Prometheus already collects application metrics in-clusterNeed to right-size pod resource requests/limitsVertical Pod Autoscaler (VPA)
Multiple metrics should drive scaling decisionsSingle-event trigger with complex business logicCustom controller or KEDA with multiple triggers

Important Caveats

Related Units