Kubernetes HPA configuration

- Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).

k8s horizontal pod autoscaler setup

- Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).

HPA v2 autoscaling YAML

- Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).

Kubernetes autoscale deployment pods

- Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).

HPA custom metrics Prometheus adapter

- Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).

Kubernetes Horizontal Pod Autoscaler (HPA): Complete Reference

Kubernetes reference: Horizontal Pod Autoscaler

TL;DR

Bottom line: HPA automatically adjusts pod replica count based on observed CPU, memory, custom, or external metrics using the formula desiredReplicas = ceil(currentReplicas * (currentMetric / desiredMetric)).
Key tool/command: kubectl autoscale deployment <name> --cpu-percent=50 --min=2 --max=10
Watch out for: Pods without resource requests defined -- HPA cannot calculate utilization percentage and will not scale.
Works with: Kubernetes 1.23+ (autoscaling/v2 GA). Requires metrics-server for CPU/memory. Custom metrics need prometheus-adapter or equivalent.

Constraints

Pods MUST have resource requests defined -- HPA calculates utilization as a percentage of requests, not limits
metrics-server MUST be deployed in the cluster for CPU/memory-based HPA to function
HPA and VPA MUST NOT target the same pods on the same resource metric (CPU/memory) -- use VPA for sizing, HPA with custom metrics for scaling
Use autoscaling/v2 API -- v1 only supports CPU and lacks scaling behavior controls
minReplicas MUST be >= 1 for HPA (use KEDA if you need scale-to-zero)
Custom and external metrics require a metrics adapter registered with the aggregation layer

Quick Reference

HPA Metric Types

#	Metric Type	API Group	Source	Target Types	Example
1	Resource	metrics.k8s.io	metrics-server	Utilization, AverageValue	CPU at 50% utilization
2	ContainerResource	metrics.k8s.io	metrics-server	Utilization, AverageValue	CPU of specific container
3	Pods	custom.metrics.k8s.io	prometheus-adapter	AverageValue	requests_per_second per pod
4	Object	custom.metrics.k8s.io	prometheus-adapter	Value, AverageValue	Ingress requests_per_second
5	External	external.metrics.k8s.io	cloud provider adapter	Value, AverageValue	SQS queue depth from CloudWatch

Scaling Behavior Configuration

Parameter	Default (Scale Up)	Default (Scale Down)	Description
stabilizationWindowSeconds	0	300 (5 min)	Window to pick highest/lowest recommendation
policies[].type	Percent / Pods	Percent / Pods	Scale by percentage or absolute pod count
policies[].value	100% or 4 pods	100%	Maximum change per period
policies[].periodSeconds	15	60	How often policy can trigger
selectPolicy	Max	Max	Pick largest (Max) or smallest (Min) of policies

HPA Algorithm Parameters

Parameter	Default	Flag	Description
Sync period	15s	`--horizontal-pod-autoscaler-sync-period`	How often HPA evaluates metrics
Tolerance	0.1 (10%)	`--horizontal-pod-autoscaler-tolerance`	Ratio must differ from 1.0 by this much to trigger scaling
Downscale stabilization	300s	Via `behavior.scaleDown.stabilizationWindowSeconds`	Prevents rapid scale-down oscillation
Initial readiness delay	30s	`--horizontal-pod-autoscaler-initial-readiness-delay`	Ignore unready pods for this duration
CPU init period	5m	`--horizontal-pod-autoscaler-cpu-initialization-period`	Ignore CPU metrics during startup

Decision Tree

START: What type of autoscaling do you need?
|
+-- Need to scale replica count based on CPU/memory?
|   +-- YES --> Use HPA with Resource metrics (see Step 1-2)
|   +-- NO |
|
+-- Need to scale on application-specific metrics (RPS, queue length)?
|   +-- YES --> Is the metric available inside the cluster (Prometheus)?
|   |   +-- YES --> Use HPA + prometheus-adapter (see Step 3)
|   |   +-- NO --> Use HPA + external metrics adapter (see Step 4)
|   +-- NO |
|
+-- Need to scale to zero replicas when idle?
|   +-- YES --> Use KEDA (extends HPA with scale-to-zero)
|   +-- NO |
|
+-- Need to right-size pod resource requests, not add replicas?
|   +-- YES --> Use VPA (Vertical Pod Autoscaler), not HPA
|   +-- NO |
|
+-- Workload is stateful (database, cache)?
|   +-- YES --> Prefer VPA or manual scaling; HPA risks split-brain
|   +-- NO |
|
+-- Need cluster node scaling when pods are unschedulable?
|   +-- YES --> Use Cluster Autoscaler alongside HPA
|   +-- NO |
|
+-- DEFAULT --> Start with HPA on CPU at 50-70% target

Step-by-Step Guide

1. Install metrics-server

metrics-server collects CPU and memory usage from kubelets and exposes them via the metrics.k8s.io API. Required for any resource-based HPA. [src1]

# Install metrics-server via kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# For local clusters (minikube, kind) -- disable TLS verification
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

Verify: kubectl top nodes -- should show CPU and memory usage for each node.

2. Create a basic HPA with CPU target

Deploy an HPA targeting a Deployment at 50% average CPU utilization. [src2]

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Verify: kubectl get hpa my-app-hpa -- TARGETS column should show current/target percentages.

3. Set up custom metrics with prometheus-adapter

Install prometheus-adapter to expose Prometheus metrics via the custom.metrics.k8s.io API. [src3]

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc \
  --set prometheus.port=9090

Verify: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . -- should list custom metrics.

4. Configure scaling behavior (stabilization)

Fine-tune scale-up aggressiveness and scale-down conservatism to prevent flapping. [src5]

behavior:
  scaleUp:
    stabilizationWindowSeconds: 30
    policies:
    - type: Percent
      value: 100
      periodSeconds: 60
    - type: Pods
      value: 4
      periodSeconds: 60
    selectPolicy: Max
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10
      periodSeconds: 120
    selectPolicy: Min

Verify: kubectl describe hpa my-app-hpa -- check Events section for scaling decisions.

5. Create HPA with custom metrics

Scale on application-specific metrics like requests per second after setting up prometheus-adapter. [src3]

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

Verify: kubectl get hpa my-app-hpa-custom -- both metrics should show current values.

6. Use KEDA for event-driven autoscaling

When you need scale-to-zero or 60+ external event source integrations beyond what HPA offers natively. [src4]

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Verify: kubectl get scaledobject -- READY should be True.

Code Examples

YAML: Multi-Metric HPA with External Metrics

# Input:  Deployment "api-server" running in production
# Output: HPA scaling on CPU, memory, and AWS SQS queue depth

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: External
    external:
      metric:
        name: sqs_queue_messages_visible
        selector:
          matchLabels:
            queue: api-tasks
      target:
        type: AverageValue
        averageValue: "20"

Python: Kubernetes Client HPA Management

# Input:  Kubernetes cluster with kubeconfig configured
# Output: Creates/updates an HPA programmatically

from kubernetes import client, config  # kubernetes==29.0.0
config.load_kube_config()
autoscaling_v2 = client.AutoscalingV2Api()

hpa = client.V2HorizontalPodAutoscaler(
    metadata=client.V1ObjectMeta(name="my-app-hpa"),
    spec=client.V2HorizontalPodAutoscalerSpec(
        scale_target_ref=client.V2CrossVersionObjectReference(
            api_version="apps/v1", kind="Deployment", name="my-app"),
        min_replicas=2, max_replicas=20,
        metrics=[client.V2MetricSpec(type="Resource",
            resource=client.V2ResourceMetricSource(name="cpu",
                target=client.V2MetricTarget(
                    type="Utilization", average_utilization=50)))]))

YAML: KEDA ScaledObject for Queue-Based Scaling

# Input:  RabbitMQ queue with messages to process
# Output: Scales consumer from 0 to 30 based on queue depth

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-consumer
spec:
  scaleTargetRef:
    name: consumer-deployment
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://guest:[email protected]:5672/
      queueName: task-queue
      queueLength: "5"

Anti-Patterns

Wrong: HPA without resource requests

# BAD -- HPA cannot calculate utilization without resource requests
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: my-app:latest
        # No resources.requests defined!
# Result: HPA shows <unknown>/50% and never scales

Correct: Always define resource requests

# GOOD -- resource requests let HPA calculate utilization
containers:
- name: app
  image: my-app:latest
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

Wrong: Using autoscaling/v1 for production

# BAD -- v1 only supports CPU, no scaling behavior
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
  targetCPUUtilizationPercentage: 50

Correct: Use autoscaling/v2 with behavior controls

# GOOD -- v2 with scaling behavior prevents flapping
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Wrong: HPA and VPA both targeting CPU on same pods

# BAD -- HPA and VPA conflict on the same resource metric
# HPA wants more pods; VPA wants to resize same pods
# Result: oscillation, pod restarts, unpredictable behavior

Correct: VPA for sizing, HPA on custom metric

# GOOD -- VPA manages resource requests; HPA scales on custom metric
# VPA: controlledResources: ["cpu", "memory"]
# HPA: metric type Pods, name: http_requests_per_second

Common Pitfalls

Flapping (rapid oscillation): Traffic spikes cause HPA to scale up aggressively, then immediately scale down. Fix: Set behavior.scaleDown.stabilizationWindowSeconds: 300 and conservative scale-down policies. [src5]
Metrics lag during startup: New pods report high CPU during initialization, causing further scale-up. Fix: Kubernetes ignores CPU for pods in the CPU initialization period (default 5 min). Set appropriate readiness probes. [src1]
Resource requests too low: If requests.cpu: 50m but app needs 200m idle, HPA sees 400% utilization and over-provisions. Fix: Use VPA recommendations to set accurate requests first. [src6]
Custom metrics not found: HPA shows FailedGetPodsMetric error. Fix: Verify prometheus-adapter is running and metrics are exposed via kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1". [src3]
minReplicas: 1 in production: Single pod means zero redundancy during scaling lag. Fix: Set minReplicas: 2 minimum for production workloads. [src6]
Ignoring the 10% tolerance: HPA default tolerance of 0.1 means a 50% target won't scale until usage exceeds ~55%. Fix: Account for this when setting targets. [src1]
Multiple metrics select highest replica count: HPA picks the metric requesting the most replicas, which can cause over-provisioning. Fix: Tune each metric target independently. [src1]

Diagnostic Commands

# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system

# View current resource usage for pods
kubectl top pods -n <namespace>

# View HPA status (current metrics vs targets)
kubectl get hpa -n <namespace>

# Detailed HPA status with events and conditions
kubectl describe hpa <hpa-name> -n <namespace>

# Verify custom metrics API is registered
kubectl get apiservice | grep metrics

# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'

# Watch HPA scaling events in real time
kubectl get events --field-selector reason=SuccessfulRescale --watch

# Check KEDA ScaledObject status
kubectl get scaledobject -n <namespace>

Version History & Compatibility

API Version	K8s Version	Status	Key Changes
autoscaling/v2	1.23+ (GA)	Current, recommended	Multiple metrics, custom/external, scaling behavior, container resources
autoscaling/v2beta2	1.12-1.25	Removed in 1.26	Identical to v2; migrate to autoscaling/v2
autoscaling/v2beta1	1.8-1.25	Removed in 1.26	Early multi-metric support
autoscaling/v1	1.2+	Supported but limited	CPU-only, no scaling behavior, no custom metrics
Container resources	1.20 (beta), 1.27 (GA)	Current	Per-container metric targeting
KEDA 2.x	Any	CNCF graduated	60+ scalers, scale-to-zero, ScaledObject/ScaledJob

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Stateless web services need to handle variable HTTP traffic	Workload is stateful with complex state management (databases)	Manual scaling or operator-based scaling
CPU/memory utilization correlates with load	Load pattern is purely event-driven (queue consumers, cron jobs)	KEDA with event-source triggers
Need to scale replicas 1-N based on resource metrics	Need to scale from 0 to N (cold-start workloads)	KEDA (supports scale-to-zero)
Prometheus already collects application metrics in-cluster	Need to right-size pod resource requests/limits	Vertical Pod Autoscaler (VPA)
Multiple metrics should drive scaling decisions	Single-event trigger with complex business logic	Custom controller or KEDA with multiple triggers

Important Caveats

The HPA scaling formula operates on average metrics across all ready pods -- a single hot pod does not trigger scaling unless the average exceeds the target
HPA evaluates metrics every 15 seconds by default, but scaling decisions are further smoothed by the stabilization window -- expect 30-60 seconds minimum response time for scale-up
When multiple metrics are specified, HPA calculates desired replicas for each independently and selects the maximum -- this can lead to over-provisioning
Container resource metrics (targeting a specific container) are GA since Kubernetes 1.27 -- use this when sidecar containers skew whole-pod CPU averages
KEDA does not replace HPA; it creates and manages HPA objects under the hood -- KEDA handles the 0-1 transition, HPA handles 1-N
Cluster Autoscaler is complementary to HPA: HPA scales pods, then Cluster Autoscaler scales nodes when pods cannot be scheduled