Prometheus Grafana Docker Compose monitoring stack

- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.

docker compose prometheus grafana node exporter setup

- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.

Docker monitoring stack with Prometheus and Grafana

- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.

prometheus grafana cAdvisor alertmanager docker compose

- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.

container monitoring Docker Compose Prometheus

- Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.

Docker Compose: Prometheus + Grafana Monitoring Stack

Docker Compose reference: Prometheus + Grafana

TL;DR

Bottom line: A production-ready monitoring stack using Docker Compose combines Prometheus (metrics collection), Grafana (visualization), Node Exporter (host metrics), cAdvisor (container metrics), and Alertmanager (notifications) -- all configured declaratively with persistent storage and auto-provisioned dashboards.
Key tool/command: docker compose up -d with pinned image versions and named volumes for all stateful services.
Watch out for: Running Prometheus or Alertmanager without authentication -- both expose unauthenticated web UIs by default on ports 9090 and 9093.
Works with: Docker Compose v2+, Prometheus 3.x, Grafana 12.x, Node Exporter 1.9.x, cAdvisor 0.49.x, Alertmanager 0.28.x. Linux, macOS, Windows (WSL2).

Constraints

Pin image versions in production -- never use :latest for Prometheus, Grafana, or exporters
Never expose Prometheus (9090) or Alertmanager (9093) publicly without authentication -- no auth by default
cAdvisor requires privileged volume mounts (/var/run, /sys, /var/lib/docker) -- review security implications
Node Exporter must run with --path.rootfs=/host and host PID namespace for accurate host metrics
Grafana provisioned datasources/dashboards are read-only in UI unless editable: true is set

Quick Reference

Service Configuration Summary

Service	Image	Ports	Volumes	Key Config
Prometheus	`prom/prometheus:v3.10.0`	`9090:9090`	`prometheus_data:/prometheus`, `./prometheus.yml:/etc/prometheus/prometheus.yml`	`--storage.tsdb.retention.time=30d`
Grafana	`grafana/grafana:12.4.0`	`3000:3000`	`grafana_data:/var/lib/grafana`, `./grafana/provisioning:/etc/grafana/provisioning`	`GF_SECURITY_ADMIN_PASSWORD`
Node Exporter	`prom/node-exporter:v1.9.0`	`9100:9100`	`/:/host:ro,rslave`	`--path.rootfs=/host`, PID host
cAdvisor	`gcr.io/cadvisor/cadvisor:v0.49.1`	`8080:8080`	`/var/run:/var/run:ro`, `/sys:/sys:ro`, `/var/lib/docker:/var/lib/docker:ro`	Privileged mounts required
Alertmanager	`prom/alertmanager:v0.28.1`	`9093:9093`	`alertmanager_data:/alertmanager`, `./alertmanager.yml:/etc/alertmanager/alertmanager.yml`	Route + receiver config

Default Endpoints

Endpoint	URL	Purpose
Prometheus UI	`http://localhost:9090`	Query, targets, rules, TSDB status
Prometheus Targets	`http://localhost:9090/targets`	Scrape target health check
Grafana UI	`http://localhost:3000`	Dashboards (default: admin/admin)
Alertmanager UI	`http://localhost:9093`	Alert status, silences
Node Exporter Metrics	`http://localhost:9100/metrics`	Raw host metrics
cAdvisor UI	`http://localhost:8080`	Container metrics explorer

Decision Tree

START: What monitoring do you need?
├── Host metrics only (CPU, RAM, disk, network)?
│   ├── YES → Prometheus + Node Exporter + Grafana (skip cAdvisor)
│   └── NO ↓
├── Docker container metrics only?
│   ├── YES → Prometheus + cAdvisor + Grafana (skip Node Exporter)
│   └── NO ↓
├── Both host + container metrics?
│   ├── YES → Full stack: Prometheus + Node Exporter + cAdvisor + Grafana
│   └── NO ↓
├── Need alerting (email, Slack, PagerDuty)?
│   ├── YES → Add Alertmanager service + alert rules
│   └── NO → Skip Alertmanager
├── Running on Kubernetes?
│   ├── YES → Use kube-prometheus-stack Helm chart instead
│   └── NO ↓
└── DEFAULT → Full stack with all 5 services

Step-by-Step Guide

1. Create the project directory structure

Organize configuration files into service-specific directories for clarity. [src5]

mkdir -p monitoring/{prometheus,grafana/provisioning/datasources,grafana/provisioning/dashboards,alertmanager}
cd monitoring

Verify: find . -type d → should list all subdirectories.

2. Create the Docker Compose file

Define all services with pinned versions, named volumes, health checks, and a shared network. [src1]

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    container_name: prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=30d"
      - "--web.enable-lifecycle"
      - "--storage.tsdb.wal-compression"
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules.yml:/etc/prometheus/rules.yml:ro
    ports:
      - "9090:9090"
    restart: unless-stopped
    networks:
      - monitoring
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 5s
      retries: 3

  grafana:
    image: grafana/grafana:12.4.0
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-changeme}
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    ports:
      - "3000:3000"
    restart: unless-stopped
    networks:
      - monitoring
    depends_on:
      prometheus:
        condition: service_healthy

  node-exporter:
    image: prom/node-exporter:v1.9.0
    container_name: node-exporter
    command:
      - "--path.rootfs=/host"
    volumes:
      - "/:/host:ro,rslave"
    pid: host
    restart: unless-stopped
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    devices:
      - /dev/kmsg
    privileged: true
    restart: unless-stopped
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:v0.28.1
    container_name: alertmanager
    command:
      - "--config.file=/etc/alertmanager/alertmanager.yml"
      - "--storage.path=/alertmanager"
    volumes:
      - alertmanager_data:/alertmanager
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    ports:
      - "9093:9093"
    restart: unless-stopped
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

networks:
  monitoring:
    driver: bridge

Verify: docker compose config → should print resolved YAML with no errors.

3. Configure Prometheus scrape targets

Define scrape jobs for all exporters. Use Docker DNS names (service names resolve automatically within the compose network). [src1]

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - "rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "cadvisor"
    scrape_interval: 10s
    static_configs:
      - targets: ["cadvisor:8080"]

Verify: After starting, visit http://localhost:9090/targets → all targets should show UP.

4. Create Prometheus alert rules

Define alert conditions for common failure scenarios. [src5]

# prometheus/rules.yml
groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning

      - alert: DiskSpaceLow
        expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 90
        for: 10m
        labels:
          severity: critical

  - name: container_alerts
    rules:
      - alert: ContainerHighCPU
        expr: rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100 > 80
        for: 5m
        labels:
          severity: warning

Verify: http://localhost:9090/rules → rules should appear as loaded.

5. Configure Alertmanager

Set up routing and receivers for notifications. [src5]

# alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ["alertname", "severity"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: "default"

receivers:
  - name: "default"
    webhook_configs:
      - url: "http://example.com/webhook"
        send_resolved: true

Verify: http://localhost:9093/#/status → config should be loaded.

6. Provision Grafana datasource and dashboard

Use Grafana provisioning to auto-configure Prometheus as a datasource. [src3]

# grafana/provisioning/datasources/datasource.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

# grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1

providers:
  - name: "default"
    orgId: 1
    folder: ""
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    options:
      path: /etc/grafana/provisioning/dashboards
      foldersFromFilesStructure: false

Verify: Log into Grafana at http://localhost:3000 → Configuration > Data Sources → Prometheus should appear.

7. Start the stack and verify

Launch all services and confirm everything is healthy. [src5]

# Start all services
docker compose up -d

# Check all containers are running
docker compose ps

# View logs for any errors
docker compose logs --tail=50

Verify: docker compose ps → all 5 services should show running.

Code Examples

Grafana Dashboard JSON: Node Exporter Host Overview

{
  "dashboard": {
    "title": "Node Exporter - Host Overview",
    "uid": "node-exporter-host",
    "panels": [
      {
        "title": "CPU Usage %",
        "type": "timeseries",
        "targets": [{
          "expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "legendFormat": "CPU %"
        }],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "title": "Memory Usage %",
        "type": "timeseries",
        "targets": [{
          "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
          "legendFormat": "Memory %"
        }],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      }
    ],
    "time": {"from": "now-1h", "to": "now"},
    "refresh": "30s"
  }
}

Prometheus Recording Rules: Pre-compute expensive queries

# prometheus/recording-rules.yml
groups:
  - name: node_recording_rules
    interval: 15s
    rules:
      - record: instance:node_cpu_utilization:ratio
        expr: 1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))

      - record: instance:node_memory_utilization:ratio
        expr: 1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

      - record: instance:node_disk_utilization:ratio
        expr: 1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

Anti-Patterns

Wrong: Using :latest tags for all images

# BAD -- unpinned versions cause silent breaking changes
services:
  prometheus:
    image: prom/prometheus:latest
  grafana:
    image: grafana/grafana:latest
  node-exporter:
    image: prom/node-exporter:latest

Correct: Pin specific versions

# GOOD -- predictable, reproducible deployments
services:
  prometheus:
    image: prom/prometheus:v3.10.0
  grafana:
    image: grafana/grafana:12.4.0
  node-exporter:
    image: prom/node-exporter:v1.9.0

Wrong: No persistent volumes

# BAD -- all metrics data lost on container restart
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    # No volumes defined

Correct: Named volumes for all stateful services

# GOOD -- data survives container restarts and upgrades
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    volumes:
      - prometheus_data:/prometheus
volumes:
  prometheus_data:

Wrong: Exposing Prometheus publicly without auth

# BAD -- anyone can query/delete metrics
services:
  prometheus:
    ports:
      - "0.0.0.0:9090:9090"

Correct: Bind to localhost or use reverse proxy with auth

# GOOD -- only accessible locally
services:
  prometheus:
    ports:
      - "127.0.0.1:9090:9090"

Wrong: Node Exporter without host PID and rootfs mount

# BAD -- incomplete host metrics
services:
  node-exporter:
    image: prom/node-exporter:v1.9.0
    # Missing pid: host and /:/host volume

Correct: Proper Node Exporter configuration

# GOOD -- accurate host metrics with rootfs remapping
services:
  node-exporter:
    image: prom/node-exporter:v1.9.0
    command:
      - "--path.rootfs=/host"
    volumes:
      - "/:/host:ro,rslave"
    pid: host

Wrong: Hardcoded Grafana admin password

# BAD -- password committed to version control
services:
  grafana:
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=mysecretpassword

Correct: Use environment variable or .env file

# GOOD -- password loaded from .env file (excluded from git)
services:
  grafana:
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}

Common Pitfalls

Prometheus targets show DOWN: Service names in scrape_configs must match docker-compose service names exactly. node-exporter:9100 not node_exporter:9100 or localhost:9100. Fix: Check docker compose ps for exact service names. [src5]
Grafana shows "No data": Prometheus datasource URL must use Docker internal DNS name http://prometheus:9090, not http://localhost:9090. Fix: Use service name in datasource URL. [src3]
cAdvisor crash on newer kernels: cAdvisor may fail with Failed to start container manager on Linux 6.x+ kernels. Fix: Add --docker_only=true flag or update to cAdvisor v0.49+. [src2]
Prometheus OOM on high cardinality: Too many unique label combinations cause memory exhaustion. Fix: Add metric_relabel_configs to drop high-cardinality labels, or set --storage.tsdb.retention.size=5GB. [src1]
Alert rules not loading: rule_files path in prometheus.yml must match the mounted path inside the container. Fix: Verify mount paths match rule_files paths. [src5]
Grafana provisioned dashboards can't be edited: Provisioned dashboards are read-only by default. Fix: Set editable: true in datasource config, or export and re-provision. [src3]
Node Exporter shows container filesystem: Missing --path.rootfs=/host flag or volume mount. Fix: Ensure both the volume mount and the flag are set. [src6]
Alertmanager not receiving alerts: Must reference alertmanager service by Docker DNS name alertmanager:9093, not localhost:9093. Fix: Use Docker service names. [src5]

Diagnostic Commands

# Check all service health
docker compose ps

# View Prometheus targets status
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl, health, lastError}'

# Test Prometheus config validity
docker compose exec prometheus promtool check config /etc/prometheus/prometheus.yml

# Validate alert rules
docker compose exec prometheus promtool check rules /etc/prometheus/rules.yml

# Check Grafana datasource connectivity
curl -s -u admin:changeme http://localhost:3000/api/datasources | jq '.[].name'

# Test Alertmanager config
docker compose exec alertmanager amtool check-config /etc/alertmanager/alertmanager.yml

# Query a specific metric directly
curl -s 'http://localhost:9090/api/v1/query?query=up' | jq '.data.result'

# Reload Prometheus config without restart (requires --web.enable-lifecycle)
curl -X POST http://localhost:9090/-/reload

# Check container resource usage
docker stats --no-stream

Version History & Compatibility

Component	Version	Status	Breaking Changes	Notes
Prometheus	3.10.0	Current	3.0 removed deprecated flags, UTF-8 metric names	LTS: 3.5.1
Prometheus	2.54.x	LTS until 2025-07	--	Last 2.x LTS
Grafana	12.4.0	Current	12.0 changed auth defaults	Unified alerting is default
Node Exporter	1.9.0	Current	None	--
cAdvisor	0.49.1	Current	0.47+ requires Linux 5.4+	Google-maintained
Alertmanager	0.28.1	Current	0.27 removed v1 API	v2 API only
Docker Compose	v2.x	Current	v1 syntax deprecated	Built into Docker CLI

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Self-hosted monitoring on VMs or bare metal with Docker	Running on Kubernetes	kube-prometheus-stack Helm chart
Need full control over retention, scraping, alerting	Want managed monitoring with zero ops	Grafana Cloud, Datadog, or AWS CloudWatch
Dev/staging environment monitoring	Monitoring 1000+ nodes	Thanos or Cortex for horizontal scaling
Docker Compose is already your deployment tool	Need log aggregation (not metrics)	Loki + Grafana or ELK stack
Budget-conscious -- all components free and open-source	Need APM/tracing	OpenTelemetry + Jaeger or commercial APM

Important Caveats

Prometheus stores data locally on a single node -- it is NOT horizontally scalable; for multi-node or long-term storage, add Thanos, Cortex, or Mimir as a remote write target
cAdvisor's privileged: true and volume mounts give it broad host access -- in multi-tenant environments, evaluate the security risk
Grafana dashboard JSON exported from the UI may differ from provisioned JSON format -- always test provisioned dashboards after export
Docker Desktop on macOS and Windows runs containers in a Linux VM, so Node Exporter reports the VM's metrics, not the host OS; use platform-specific collectors for native host metrics
Prometheus 3.0 introduced breaking changes (removed deprecated flags, OTLP endpoint enabled by default) -- verify configuration when upgrading from 2.x
Default scrape interval of 15s generates ~5,760 samples/day per metric per target -- adjust based on storage capacity and metric cardinality