kubectl apply -f statefulset.yaml with volumeClaimTemplates and a headless Service (clusterIP: None)| Feature | StatefulSet | Deployment |
|---|---|---|
| Pod naming | Predictable: {name}-0, {name}-1, ... | Random: {name}-{hash} |
| Network identity | Stable DNS via headless Service | Ephemeral, load-balanced |
| Storage | Per-Pod PVC via volumeClaimTemplates | Shared or no persistent storage |
| Scaling order | Sequential (0, 1, 2...) | Parallel |
| Deletion order | Reverse sequential (2, 1, 0) | Parallel |
| Rolling update | Reverse ordinal (highest first) | Configurable (maxSurge) |
| Use for databases | Yes -- primary/replica, stable identity | Only stateless caches |
| Headless Service required | Yes | No |
| PVC cleanup on delete | Manual | N/A |
| Operator | Database | License | HA | Auto-Backup | Monitoring | CNCF |
|---|---|---|---|---|---|---|
| CloudNativePG | PostgreSQL | Apache 2.0 | Streaming replication + auto-failover | Object store (S3, GCS, Azure) + PITR | Prometheus exporter | Sandbox |
| Percona Operator PG | PostgreSQL | Apache 2.0 | Patroni-based HA | S3/GCS/Azure + PITR | PMM integration | No |
| Percona Operator MySQL | MySQL (PXC) | Apache 2.0 | Galera multi-primary | S3/GCS + PITR | PMM integration | No |
| Percona Operator MongoDB | MongoDB | Apache 2.0 | Replica set auto-failover | S3/GCS + PITR | PMM integration | No |
| MongoDB Community | MongoDB | Apache 2.0 | Replica set | Manual | Basic | No |
| Zalando PG Operator | PostgreSQL | MIT | Patroni HA | WAL-G to S3/GCS | Built-in | No |
| Crunchy PGO | PostgreSQL | Apache 2.0 | Patroni HA | pgBackRest | Prometheus | No |
| Component | DNS Format | Example |
|---|---|---|
| Pod | {pod}.{service}.{namespace}.svc.cluster.local | postgres-0.postgres-hl.default.svc.cluster.local |
| Service (headless) | {service}.{namespace}.svc.cluster.local | postgres-hl.default.svc.cluster.local |
| Primary (convention) | {statefulset}-0.{service}.{ns}.svc.cluster.local | postgres-0.postgres-hl.default.svc.cluster.local |
START: Do you need a database on Kubernetes?
├── Can you use a managed database (RDS, Cloud SQL, Azure DB)?
│ ├── YES → Use managed database. Simplest, most reliable option.
│ └── NO (on-prem, air-gapped, cost, or data sovereignty) ↓
├── Team has 3+ engineers with Kubernetes experience?
│ ├── YES → Use a database operator (CloudNativePG, Percona, Crunchy PGO)
│ └── NO ↓
├── Database is PostgreSQL?
│ ├── YES → CloudNativePG (simplest operator, CNCF, strong community)
│ └── NO ↓
├── Database is MySQL?
│ ├── YES → Percona Operator for MySQL (Galera-based HA)
│ └── NO ↓
├── Database is MongoDB?
│ ├── YES → Percona Operator for MongoDB (replica set + sharding)
│ └── NO ↓
├── Need fine-grained control or learning exercise?
│ ├── YES → Manual StatefulSet (see Step-by-Step Guide)
│ └── NO ↓
└── DEFAULT → CloudNativePG for PostgreSQL, Percona for MySQL/MongoDB.
The headless Service provides stable DNS names for each Pod. Without it, StatefulSet Pods cannot be individually addressed. [src1]
apiVersion: v1
kind: Service
metadata:
name: postgres-hl
spec:
clusterIP: None # Headless
selector:
app: postgres
ports:
- port: 5432
name: postgres
Verify: kubectl get svc postgres-hl → should show CLUSTER-IP: None
Never store passwords in plain text in StatefulSet YAML. Use Kubernetes Secrets. [src5]
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
type: Opaque
stringData:
POSTGRES_PASSWORD: "changeme-use-strong-password"
POSTGRES_USER: "postgres"
POSTGRES_DB: "appdb"
Verify: kubectl get secret postgres-secret → Opaque type with 3 data keys
Creates a PostgreSQL instance with persistent storage. Each replica gets its own PVC. [src1] [src5]
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-hl
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16-alpine
# ... (see full YAML in Code Examples)
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
Verify: kubectl get statefulset postgres → READY: 1/1; kubectl get pvc → postgres-data-postgres-0 Bound
For primary/replica setups, use init containers to determine the Pod's role based on its ordinal index. [src2]
initContainers:
- name: init-role
image: postgres:16-alpine
command: ['sh', '-c']
args:
- |
ORDINAL=$(echo $HOSTNAME | rev | cut -d'-' -f1 | rev)
if [ "$ORDINAL" = "0" ]; then
echo "primary" > /config/role
else
echo "replica" > /config/role
fi
Verify: kubectl exec postgres-0 -- cat /config/role → primary
Use the database's native backup tool in a CronJob, not filesystem snapshots. [src7]
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:16-alpine
command: ['sh', '-c']
args:
- pg_dump -h postgres-0.postgres-hl -U postgres -Fc appdb > /backup/$(date +%Y%m%d).dump
Verify: kubectl get cronjob postgres-backup → schedule 0 2 * * *
For production PostgreSQL, install CloudNativePG and declare a Cluster resource. [src3]
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: app-db
spec:
instances: 3
storage:
size: 20Gi
storageClass: standard
backup:
barmanObjectStore:
destinationPath: s3://my-bucket/backups
Verify: kubectl get cluster app-db → Phase: Cluster in healthy state
Full script: postgres-statefulset.yaml (48 lines)
# Input: Kubernetes cluster with dynamic storage provisioner
# Output: Single PostgreSQL instance with persistent storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-hl
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16-alpine
ports:
- containerPort: 5432
envFrom:
- secretRef:
name: postgres-secret
env:
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
readinessProbe:
exec:
command: ["pg_isready", "-U", "postgres"]
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
exec:
command: ["pg_isready", "-U", "postgres"]
initialDelaySeconds: 30
periodSeconds: 15
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
# Input: Kubernetes cluster with CloudNativePG operator installed
# Output: 3-node PostgreSQL HA cluster with S3 backups + PITR
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-db
spec:
instances: 3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "512MB"
storage:
size: 50Gi
storageClass: fast-ssd
backup:
retentionPolicy: "30d"
barmanObjectStore:
destinationPath: s3://backups/production-db
# BAD -- Deployment gives random Pod names, no stable DNS for replication
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 3
template:
spec:
containers:
- name: postgres
image: postgres:16-alpine
volumes:
- name: data
emptyDir: {} # Data lost on Pod restart!
# GOOD -- StatefulSet provides stable identity and persistent storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-hl
replicas: 3
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
# BAD -- ClusterIP Service load-balances; replicas can't address each other
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
selector:
app: postgres
ports:
- port: 5432
# Missing clusterIP: None -- creates a load-balanced Service
# GOOD -- headless Service creates individual DNS records per Pod
apiVersion: v1
kind: Service
metadata:
name: postgres-hl
spec:
clusterIP: None
selector:
app: postgres
ports:
- port: 5432
# BAD -- copying data directory while database is running produces corrupt backup
kubectl cp postgres-0:/var/lib/postgresql/data ./backup/
# WAL files may be inconsistent; you'll get a corrupted restore
# GOOD -- pg_dump creates a consistent logical backup
kubectl exec postgres-0 -- pg_dump -U postgres -Fc appdb > backup.dump
# GOOD -- for physical backup, use pg_basebackup
kubectl exec postgres-0 -- pg_basebackup -D /tmp/backup -Ft -z -P
# BAD -- credentials visible in version control and kubectl describe
env:
- name: POSTGRES_PASSWORD
value: "mysecretpassword"
# GOOD -- reference Secret for credentials
envFrom:
- secretRef:
name: postgres-secret
kubectl delete pvc postgres-data-postgres-1 postgres-data-postgres-2. [src1]PGDATA=/var/lib/postgresql/data/pgdata and mount volume at /var/lib/postgresql/data. [src5]pg_isready or mysqladmin ping as readiness probe. [src2]ReadWriteMany when the storage provider only supports ReadWriteOnce causes PVC binding failures. Fix: Check kubectl get storageclass and match access modes. [src5]minAvailable: 1 for database StatefulSets. [src1]# List StatefulSet status and ready replicas
kubectl get statefulset postgres -o wide
# Check PVC status (should all be Bound)
kubectl get pvc -l app=postgres
# View Pod DNS resolution from inside the cluster
kubectl run -it --rm debug --image=busybox -- nslookup postgres-0.postgres-hl
# Check database readiness from Pod
kubectl exec postgres-0 -- pg_isready -U postgres
# View StatefulSet events (useful for debugging stuck rollouts)
kubectl describe statefulset postgres
# Check storage class availability
kubectl get storageclass
# View Pod logs for database startup errors
kubectl logs postgres-0 --tail=50
# Check PV reclaim policy (should be Retain for production)
kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy,STATUS:.status.phase
# Check operator status (CloudNativePG)
kubectl get cluster -A
kubectl get pods -n cnpg-system
| K8s Version | Feature | Status | Notes |
|---|---|---|---|
| 1.9+ | StatefulSet API | GA (apps/v1) | Stable since 2017 |
| 1.24+ | PVC auto-deletion | Beta | StatefulSetAutoDeletePVC feature gate |
| 1.25+ | minReadySeconds | GA | Pod must be ready for N seconds |
| 1.26+ | Start ordinal | Beta | Custom starting ordinal index |
| 1.27+ | PVC resize for StatefulSets | Stable | Expand PVCs without recreation |
| 1.31+ | Start ordinal | GA | Custom ordinal ranges fully stable |
| 1.31+ | maxUnavailable for RollingUpdate | Beta | Parallel rolling updates |
| Operator | Latest Version | Min K8s | Database Support |
|---|---|---|---|
| CloudNativePG | 1.25.x | 1.27+ | PostgreSQL 12-17 |
| Percona Operator PG | 2.5.x | 1.27+ | PostgreSQL 13-17 |
| Percona Operator MySQL | 1.16.x | 1.26+ | MySQL 8.0 (PXC) |
| Percona Operator MongoDB | 1.18.x | 1.26+ | MongoDB 6.0-8.0 |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Database requires stable Pod identity for primary/replica topology | Application is stateless (web servers, API gateways) | Deployment |
| Need ordered startup (primary before replicas) | Database can tolerate random Pod names | Deployment with PVC |
| Per-Pod persistent storage is required | All replicas share the same data volume | Deployment + single PVC |
| Running on-prem or air-gapped (no managed DB option) | Cloud provider offers managed database | Managed database service |
| Dev/test environment needing realistic database setup | Production DB needing automated failover and PITR | Database operator (CloudNativePG, Percona) |
| Learning Kubernetes stateful workloads | Team lacks Kubernetes operational expertise | Managed database service |
persistentVolumeClaimRetentionPolicy) is Beta as of Kubernetes 1.31 -- do not rely on it in production without testing; the default behavior is to retain PVCsallowVolumeExpansion: true -- shrinking is never supported