Docker Compose: Prometheus + Grafana Monitoring Stack
Docker Compose reference: Prometheus + Grafana
TL;DR
- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.
- Key tool/command:
docker compose up -dwith pinned image versions and named volumes for all stateful services. - Watch out for: Running Prometheus or Alertmanager without authentication -- both expose unauthenticated web UIs by default on ports 9090 and 9093.
- Works with: Docker Compose v2+, Prometheus 3.x, Grafana 12.x, Node Exporter 1.9.x, cAdvisor 0.49.x, Alertmanager 0.28.x. Linux, macOS, Windows (WSL2).
Constraints
- Pin image versions in production -- never use :latest for Prometheus, Grafana, or exporters
- Never expose Prometheus (9090) or Alertmanager (9093) publicly without authentication -- no auth by default
- cAdvisor requires privileged volume mounts (/var/run, /sys, /var/lib/docker) -- review security implications
- Node Exporter must run with --path.rootfs=/host and host PID namespace for accurate host metrics
- Grafana provisioned datasources/dashboards are read-only in UI unless editable: true is set
Quick Reference
Service Configuration Summary
| Service | Image | Ports | Volumes | Key Config |
|---|---|---|---|---|
| Prometheus | prom/prometheus:v3.10.0 | 9090:9090 | prometheus_data:/prometheus, ./prometheus.yml:/etc/prometheus/prometheus.yml | --storage.tsdb.retention.time=30d |
| Grafana | grafana/grafana:12.4.0 | 3000:3000 | grafana_data:/var/lib/grafana, ./grafana/provisioning:/etc/grafana/provisioning | GF_SECURITY_ADMIN_PASSWORD |
| Node Exporter | prom/node-exporter:v1.9.0 | 9100:9100 | /:/host:ro,rslave | --path.rootfs=/host, PID host |
| cAdvisor | gcr.io/cadvisor/cadvisor:v0.49.1 | 8080:8080 | /var/run:/var/run:ro, /sys:/sys:ro, /var/lib/docker:/var/lib/docker:ro | Privileged mounts required |
| Alertmanager | prom/alertmanager:v0.28.1 | 9093:9093 | alertmanager_data:/alertmanager, ./alertmanager.yml:/etc/alertmanager/alertmanager.yml | Route + receiver config |
Default Endpoints
| Endpoint | URL | Purpose |
|---|---|---|
| Prometheus UI | http://localhost:9090 | Query, targets, rules, TSDB status |
| Prometheus Targets | http://localhost:9090/targets | Scrape target health check |
| Grafana UI | http://localhost:3000 | Dashboards (default: admin/admin) |
| Alertmanager UI | http://localhost:9093 | Alert status, silences |
| Node Exporter Metrics | http://localhost:9100/metrics | Raw host metrics |
| cAdvisor UI | http://localhost:8080 | Container metrics explorer |
Decision Tree
START: What monitoring do you need?
├── Host metrics only (CPU, RAM, disk, network)?
│ ├── YES → Prometheus + Node Exporter + Grafana (skip cAdvisor)
│ └── NO ↓
├── Docker container metrics only?
│ ├── YES → Prometheus + cAdvisor + Grafana (skip Node Exporter)
│ └── NO ↓
├── Both host + container metrics?
│ ├── YES → Full stack: Prometheus + Node Exporter + cAdvisor + Grafana
│ └── NO ↓
├── Need alerting (email, Slack, PagerDuty)?
│ ├── YES → Add Alertmanager service + alert rules
│ └── NO → Skip Alertmanager
├── Running on Kubernetes?
│ ├── YES → Use kube-prometheus-stack Helm chart instead
│ └── NO ↓
└── DEFAULT → Full stack with all 5 services
Step-by-Step Guide
1. Create the project directory structure
Organize configuration files into service-specific directories for clarity. [src5]
mkdir -p monitoring/{prometheus,grafana/provisioning/datasources,grafana/provisioning/dashboards,alertmanager}
cd monitoring
Verify: find . -type d → should list all subdirectories.
2. Create the Docker Compose file
Define all services with pinned versions, named volumes, health checks, and a shared network. [src1]
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:v3.10.0
container_name: prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-lifecycle"
- "--storage.tsdb.wal-compression"
volumes:
- prometheus_data:/prometheus
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules.yml:/etc/prometheus/rules.yml:ro
ports:
- "9090:9090"
restart: unless-stopped
networks:
- monitoring
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 5s
retries: 3
grafana:
image: grafana/grafana:12.4.0
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-changeme}
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
ports:
- "3000:3000"
restart: unless-stopped
networks:
- monitoring
depends_on:
prometheus:
condition: service_healthy
node-exporter:
image: prom/node-exporter:v1.9.0
container_name: node-exporter
command:
- "--path.rootfs=/host"
volumes:
- "/:/host:ro,rslave"
pid: host
restart: unless-stopped
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
devices:
- /dev/kmsg
privileged: true
restart: unless-stopped
networks:
- monitoring
alertmanager:
image: prom/alertmanager:v0.28.1
container_name: alertmanager
command:
- "--config.file=/etc/alertmanager/alertmanager.yml"
- "--storage.path=/alertmanager"
volumes:
- alertmanager_data:/alertmanager
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
ports:
- "9093:9093"
restart: unless-stopped
networks:
- monitoring
volumes:
prometheus_data:
grafana_data:
alertmanager_data:
networks:
monitoring:
driver: bridge
Verify: docker compose config → should print resolved YAML with no errors.
3. Configure Prometheus scrape targets
Define scrape jobs for all exporters. Use Docker DNS names (service names resolve automatically within the compose network). [src1]
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node-exporter"
static_configs:
- targets: ["node-exporter:9100"]
- job_name: "cadvisor"
scrape_interval: 10s
static_configs:
- targets: ["cadvisor:8080"]
Verify: After starting, visit http://localhost:9090/targets → all targets should show UP.
4. Create Prometheus alert rules
Define alert conditions for common failure scenarios. [src5]
# prometheus/rules.yml
groups:
- name: node_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
- alert: DiskSpaceLow
expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 90
for: 10m
labels:
severity: critical
- name: container_alerts
rules:
- alert: ContainerHighCPU
expr: rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100 > 80
for: 5m
labels:
severity: warning
Verify: http://localhost:9090/rules → rules should appear as loaded.
5. Configure Alertmanager
Set up routing and receivers for notifications. [src5]
# alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ["alertname", "severity"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: "default"
receivers:
- name: "default"
webhook_configs:
- url: "http://example.com/webhook"
send_resolved: true
Verify: http://localhost:9093/#/status → config should be loaded.
6. Provision Grafana datasource and dashboard
Use Grafana provisioning to auto-configure Prometheus as a datasource. [src3]
# grafana/provisioning/datasources/datasource.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
# grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1
providers:
- name: "default"
orgId: 1
folder: ""
type: file
disableDeletion: false
updateIntervalSeconds: 30
options:
path: /etc/grafana/provisioning/dashboards
foldersFromFilesStructure: false
Verify: Log into Grafana at http://localhost:3000 → Configuration > Data Sources → Prometheus should appear.
7. Start the stack and verify
Launch all services and confirm everything is healthy. [src5]
# Start all services
docker compose up -d
# Check all containers are running
docker compose ps
# View logs for any errors
docker compose logs --tail=50
Verify: docker compose ps → all 5 services should show running.
Code Examples
Grafana Dashboard JSON: Node Exporter Host Overview
{
"dashboard": {
"title": "Node Exporter - Host Overview",
"uid": "node-exporter-host",
"panels": [
{
"title": "CPU Usage %",
"type": "timeseries",
"targets": [{
"expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"legendFormat": "CPU %"
}],
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
},
{
"title": "Memory Usage %",
"type": "timeseries",
"targets": [{
"expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
"legendFormat": "Memory %"
}],
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
}
],
"time": {"from": "now-1h", "to": "now"},
"refresh": "30s"
}
}
Prometheus Recording Rules: Pre-compute expensive queries
# prometheus/recording-rules.yml
groups:
- name: node_recording_rules
interval: 15s
rules:
- record: instance:node_cpu_utilization:ratio
expr: 1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
- record: instance:node_memory_utilization:ratio
expr: 1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
- record: instance:node_disk_utilization:ratio
expr: 1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})
Anti-Patterns
Wrong: Using :latest tags for all images
# BAD -- unpinned versions cause silent breaking changes
services:
prometheus:
image: prom/prometheus:latest
grafana:
image: grafana/grafana:latest
node-exporter:
image: prom/node-exporter:latest
Correct: Pin specific versions
# GOOD -- predictable, reproducible deployments
services:
prometheus:
image: prom/prometheus:v3.10.0
grafana:
image: grafana/grafana:12.4.0
node-exporter:
image: prom/node-exporter:v1.9.0
Wrong: No persistent volumes
# BAD -- all metrics data lost on container restart
services:
prometheus:
image: prom/prometheus:v3.10.0
# No volumes defined
Correct: Named volumes for all stateful services
# GOOD -- data survives container restarts and upgrades
services:
prometheus:
image: prom/prometheus:v3.10.0
volumes:
- prometheus_data:/prometheus
volumes:
prometheus_data:
Wrong: Exposing Prometheus publicly without auth
# BAD -- anyone can query/delete metrics
services:
prometheus:
ports:
- "0.0.0.0:9090:9090"
Correct: Bind to localhost or use reverse proxy with auth
# GOOD -- only accessible locally
services:
prometheus:
ports:
- "127.0.0.1:9090:9090"
Wrong: Node Exporter without host PID and rootfs mount
# BAD -- incomplete host metrics
services:
node-exporter:
image: prom/node-exporter:v1.9.0
# Missing pid: host and /:/host volume
Correct: Proper Node Exporter configuration
# GOOD -- accurate host metrics with rootfs remapping
services:
node-exporter:
image: prom/node-exporter:v1.9.0
command:
- "--path.rootfs=/host"
volumes:
- "/:/host:ro,rslave"
pid: host
Wrong: Hardcoded Grafana admin password
# BAD -- password committed to version control
services:
grafana:
environment:
- GF_SECURITY_ADMIN_PASSWORD=mysecretpassword
Correct: Use environment variable or .env file
# GOOD -- password loaded from .env file (excluded from git)
services:
grafana:
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
Common Pitfalls
- Prometheus targets show DOWN: Service names in scrape_configs must match docker-compose service names exactly.
node-exporter:9100notnode_exporter:9100orlocalhost:9100. Fix: Checkdocker compose psfor exact service names. [src5] - Grafana shows "No data": Prometheus datasource URL must use Docker internal DNS name
http://prometheus:9090, nothttp://localhost:9090. Fix: Use service name in datasource URL. [src3] - cAdvisor crash on newer kernels: cAdvisor may fail with
Failed to start container manageron Linux 6.x+ kernels. Fix: Add--docker_only=trueflag or update to cAdvisor v0.49+. [src2] - Prometheus OOM on high cardinality: Too many unique label combinations cause memory exhaustion. Fix: Add
metric_relabel_configsto drop high-cardinality labels, or set--storage.tsdb.retention.size=5GB. [src1] - Alert rules not loading:
rule_filespath in prometheus.yml must match the mounted path inside the container. Fix: Verify mount paths match rule_files paths. [src5] - Grafana provisioned dashboards can't be edited: Provisioned dashboards are read-only by default. Fix: Set
editable: truein datasource config, or export and re-provision. [src3] - Node Exporter shows container filesystem: Missing
--path.rootfs=/hostflag or volume mount. Fix: Ensure both the volume mount and the flag are set. [src6] - Alertmanager not receiving alerts: Must reference alertmanager service by Docker DNS name
alertmanager:9093, notlocalhost:9093. Fix: Use Docker service names. [src5]
Diagnostic Commands
# Check all service health
docker compose ps
# View Prometheus targets status
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl, health, lastError}'
# Test Prometheus config validity
docker compose exec prometheus promtool check config /etc/prometheus/prometheus.yml
# Validate alert rules
docker compose exec prometheus promtool check rules /etc/prometheus/rules.yml
# Check Grafana datasource connectivity
curl -s -u admin:changeme http://localhost:3000/api/datasources | jq '.[].name'
# Test Alertmanager config
docker compose exec alertmanager amtool check-config /etc/alertmanager/alertmanager.yml
# Query a specific metric directly
curl -s 'http://localhost:9090/api/v1/query?query=up' | jq '.data.result'
# Reload Prometheus config without restart (requires --web.enable-lifecycle)
curl -X POST http://localhost:9090/-/reload
# Check container resource usage
docker stats --no-stream
Version History & Compatibility
| Component | Version | Status | Breaking Changes | Notes |
|---|---|---|---|---|
| Prometheus | 3.10.0 | Current | 3.0 removed deprecated flags, UTF-8 metric names | LTS: 3.5.1 |
| Prometheus | 2.54.x | LTS until 2025-07 | -- | Last 2.x LTS |
| Grafana | 12.4.0 | Current | 12.0 changed auth defaults | Unified alerting is default |
| Node Exporter | 1.9.0 | Current | None | -- |
| cAdvisor | 0.49.1 | Current | 0.47+ requires Linux 5.4+ | Google-maintained |
| Alertmanager | 0.28.1 | Current | 0.27 removed v1 API | v2 API only |
| Docker Compose | v2.x | Current | v1 syntax deprecated | Built into Docker CLI |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Self-hosted monitoring on VMs or bare metal with Docker | Running on Kubernetes | kube-prometheus-stack Helm chart |
| Need full control over retention, scraping, alerting | Want managed monitoring with zero ops | Grafana Cloud, Datadog, or AWS CloudWatch |
| Dev/staging environment monitoring | Monitoring 1000+ nodes | Thanos or Cortex for horizontal scaling |
| Docker Compose is already your deployment tool | Need log aggregation (not metrics) | Loki + Grafana or ELK stack |
| Budget-conscious -- all components free and open-source | Need APM/tracing | OpenTelemetry + Jaeger or commercial APM |
Important Caveats
- Prometheus stores data locally on a single node -- it is NOT horizontally scalable; for multi-node or long-term storage, add Thanos, Cortex, or Mimir as a remote write target
- cAdvisor's
privileged: trueand volume mounts give it broad host access -- in multi-tenant environments, evaluate the security risk - Grafana dashboard JSON exported from the UI may differ from provisioned JSON format -- always test provisioned dashboards after export
- Docker Desktop on macOS and Windows runs containers in a Linux VM, so Node Exporter reports the VM's metrics, not the host OS; use platform-specific collectors for native host metrics
- Prometheus 3.0 introduced breaking changes (removed deprecated flags, OTLP endpoint enabled by default) -- verify configuration when upgrading from 2.x
- Default scrape interval of 15s generates ~5,760 samples/day per metric per target -- adjust based on storage capacity and metric cardinality