Docker Compose: Prometheus + Grafana Monitoring Stack

Type: Software Reference Confidence: 0.94 Sources: 7 Verified: 2026-02-27 Freshness: 2026-02-27

TL;DR

Constraints

Quick Reference

Service Configuration Summary

ServiceImagePortsVolumesKey Config
Prometheusprom/prometheus:v3.10.09090:9090prometheus_data:/prometheus, ./prometheus.yml:/etc/prometheus/prometheus.yml--storage.tsdb.retention.time=30d
Grafanagrafana/grafana:12.4.03000:3000grafana_data:/var/lib/grafana, ./grafana/provisioning:/etc/grafana/provisioningGF_SECURITY_ADMIN_PASSWORD
Node Exporterprom/node-exporter:v1.9.09100:9100/:/host:ro,rslave--path.rootfs=/host, PID host
cAdvisorgcr.io/cadvisor/cadvisor:v0.49.18080:8080/var/run:/var/run:ro, /sys:/sys:ro, /var/lib/docker:/var/lib/docker:roPrivileged mounts required
Alertmanagerprom/alertmanager:v0.28.19093:9093alertmanager_data:/alertmanager, ./alertmanager.yml:/etc/alertmanager/alertmanager.ymlRoute + receiver config

Default Endpoints

EndpointURLPurpose
Prometheus UIhttp://localhost:9090Query, targets, rules, TSDB status
Prometheus Targetshttp://localhost:9090/targetsScrape target health check
Grafana UIhttp://localhost:3000Dashboards (default: admin/admin)
Alertmanager UIhttp://localhost:9093Alert status, silences
Node Exporter Metricshttp://localhost:9100/metricsRaw host metrics
cAdvisor UIhttp://localhost:8080Container metrics explorer

Decision Tree

START: What monitoring do you need?
├── Host metrics only (CPU, RAM, disk, network)?
│   ├── YES → Prometheus + Node Exporter + Grafana (skip cAdvisor)
│   └── NO ↓
├── Docker container metrics only?
│   ├── YES → Prometheus + cAdvisor + Grafana (skip Node Exporter)
│   └── NO ↓
├── Both host + container metrics?
│   ├── YES → Full stack: Prometheus + Node Exporter + cAdvisor + Grafana
│   └── NO ↓
├── Need alerting (email, Slack, PagerDuty)?
│   ├── YES → Add Alertmanager service + alert rules
│   └── NO → Skip Alertmanager
├── Running on Kubernetes?
│   ├── YES → Use kube-prometheus-stack Helm chart instead
│   └── NO ↓
└── DEFAULT → Full stack with all 5 services

Step-by-Step Guide

1. Create the project directory structure

Organize configuration files into service-specific directories for clarity. [src5]

mkdir -p monitoring/{prometheus,grafana/provisioning/datasources,grafana/provisioning/dashboards,alertmanager}
cd monitoring

Verify: find . -type d → should list all subdirectories.

2. Create the Docker Compose file

Define all services with pinned versions, named volumes, health checks, and a shared network. [src1]

# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    container_name: prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=30d"
      - "--web.enable-lifecycle"
      - "--storage.tsdb.wal-compression"
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules.yml:/etc/prometheus/rules.yml:ro
    ports:
      - "9090:9090"
    restart: unless-stopped
    networks:
      - monitoring
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 5s
      retries: 3

  grafana:
    image: grafana/grafana:12.4.0
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-changeme}
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    ports:
      - "3000:3000"
    restart: unless-stopped
    networks:
      - monitoring
    depends_on:
      prometheus:
        condition: service_healthy

  node-exporter:
    image: prom/node-exporter:v1.9.0
    container_name: node-exporter
    command:
      - "--path.rootfs=/host"
    volumes:
      - "/:/host:ro,rslave"
    pid: host
    restart: unless-stopped
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    devices:
      - /dev/kmsg
    privileged: true
    restart: unless-stopped
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:v0.28.1
    container_name: alertmanager
    command:
      - "--config.file=/etc/alertmanager/alertmanager.yml"
      - "--storage.path=/alertmanager"
    volumes:
      - alertmanager_data:/alertmanager
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    ports:
      - "9093:9093"
    restart: unless-stopped
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

networks:
  monitoring:
    driver: bridge

Verify: docker compose config → should print resolved YAML with no errors.

3. Configure Prometheus scrape targets

Define scrape jobs for all exporters. Use Docker DNS names (service names resolve automatically within the compose network). [src1]

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - "rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: "cadvisor"
    scrape_interval: 10s
    static_configs:
      - targets: ["cadvisor:8080"]

Verify: After starting, visit http://localhost:9090/targets → all targets should show UP.

4. Create Prometheus alert rules

Define alert conditions for common failure scenarios. [src5]

# prometheus/rules.yml
groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning

      - alert: DiskSpaceLow
        expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 90
        for: 10m
        labels:
          severity: critical

  - name: container_alerts
    rules:
      - alert: ContainerHighCPU
        expr: rate(container_cpu_usage_seconds_total{name=~".+"}[5m]) * 100 > 80
        for: 5m
        labels:
          severity: warning

Verify: http://localhost:9090/rules → rules should appear as loaded.

5. Configure Alertmanager

Set up routing and receivers for notifications. [src5]

# alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ["alertname", "severity"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: "default"

receivers:
  - name: "default"
    webhook_configs:
      - url: "http://example.com/webhook"
        send_resolved: true

Verify: http://localhost:9093/#/status → config should be loaded.

6. Provision Grafana datasource and dashboard

Use Grafana provisioning to auto-configure Prometheus as a datasource. [src3]

# grafana/provisioning/datasources/datasource.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false
# grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1

providers:
  - name: "default"
    orgId: 1
    folder: ""
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    options:
      path: /etc/grafana/provisioning/dashboards
      foldersFromFilesStructure: false

Verify: Log into Grafana at http://localhost:3000 → Configuration > Data Sources → Prometheus should appear.

7. Start the stack and verify

Launch all services and confirm everything is healthy. [src5]

# Start all services
docker compose up -d

# Check all containers are running
docker compose ps

# View logs for any errors
docker compose logs --tail=50

Verify: docker compose ps → all 5 services should show running.

Code Examples

Grafana Dashboard JSON: Node Exporter Host Overview

{
  "dashboard": {
    "title": "Node Exporter - Host Overview",
    "uid": "node-exporter-host",
    "panels": [
      {
        "title": "CPU Usage %",
        "type": "timeseries",
        "targets": [{
          "expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
          "legendFormat": "CPU %"
        }],
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
      },
      {
        "title": "Memory Usage %",
        "type": "timeseries",
        "targets": [{
          "expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
          "legendFormat": "Memory %"
        }],
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
      }
    ],
    "time": {"from": "now-1h", "to": "now"},
    "refresh": "30s"
  }
}

Prometheus Recording Rules: Pre-compute expensive queries

# prometheus/recording-rules.yml
groups:
  - name: node_recording_rules
    interval: 15s
    rules:
      - record: instance:node_cpu_utilization:ratio
        expr: 1 - avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))

      - record: instance:node_memory_utilization:ratio
        expr: 1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

      - record: instance:node_disk_utilization:ratio
        expr: 1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

Anti-Patterns

Wrong: Using :latest tags for all images

# BAD -- unpinned versions cause silent breaking changes
services:
  prometheus:
    image: prom/prometheus:latest
  grafana:
    image: grafana/grafana:latest
  node-exporter:
    image: prom/node-exporter:latest

Correct: Pin specific versions

# GOOD -- predictable, reproducible deployments
services:
  prometheus:
    image: prom/prometheus:v3.10.0
  grafana:
    image: grafana/grafana:12.4.0
  node-exporter:
    image: prom/node-exporter:v1.9.0

Wrong: No persistent volumes

# BAD -- all metrics data lost on container restart
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    # No volumes defined

Correct: Named volumes for all stateful services

# GOOD -- data survives container restarts and upgrades
services:
  prometheus:
    image: prom/prometheus:v3.10.0
    volumes:
      - prometheus_data:/prometheus
volumes:
  prometheus_data:

Wrong: Exposing Prometheus publicly without auth

# BAD -- anyone can query/delete metrics
services:
  prometheus:
    ports:
      - "0.0.0.0:9090:9090"

Correct: Bind to localhost or use reverse proxy with auth

# GOOD -- only accessible locally
services:
  prometheus:
    ports:
      - "127.0.0.1:9090:9090"

Wrong: Node Exporter without host PID and rootfs mount

# BAD -- incomplete host metrics
services:
  node-exporter:
    image: prom/node-exporter:v1.9.0
    # Missing pid: host and /:/host volume

Correct: Proper Node Exporter configuration

# GOOD -- accurate host metrics with rootfs remapping
services:
  node-exporter:
    image: prom/node-exporter:v1.9.0
    command:
      - "--path.rootfs=/host"
    volumes:
      - "/:/host:ro,rslave"
    pid: host

Wrong: Hardcoded Grafana admin password

# BAD -- password committed to version control
services:
  grafana:
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=mysecretpassword

Correct: Use environment variable or .env file

# GOOD -- password loaded from .env file (excluded from git)
services:
  grafana:
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}

Common Pitfalls

Diagnostic Commands

# Check all service health
docker compose ps

# View Prometheus targets status
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {scrapeUrl, health, lastError}'

# Test Prometheus config validity
docker compose exec prometheus promtool check config /etc/prometheus/prometheus.yml

# Validate alert rules
docker compose exec prometheus promtool check rules /etc/prometheus/rules.yml

# Check Grafana datasource connectivity
curl -s -u admin:changeme http://localhost:3000/api/datasources | jq '.[].name'

# Test Alertmanager config
docker compose exec alertmanager amtool check-config /etc/alertmanager/alertmanager.yml

# Query a specific metric directly
curl -s 'http://localhost:9090/api/v1/query?query=up' | jq '.data.result'

# Reload Prometheus config without restart (requires --web.enable-lifecycle)
curl -X POST http://localhost:9090/-/reload

# Check container resource usage
docker stats --no-stream

Version History & Compatibility

ComponentVersionStatusBreaking ChangesNotes
Prometheus3.10.0Current3.0 removed deprecated flags, UTF-8 metric namesLTS: 3.5.1
Prometheus2.54.xLTS until 2025-07--Last 2.x LTS
Grafana12.4.0Current12.0 changed auth defaultsUnified alerting is default
Node Exporter1.9.0CurrentNone--
cAdvisor0.49.1Current0.47+ requires Linux 5.4+Google-maintained
Alertmanager0.28.1Current0.27 removed v1 APIv2 API only
Docker Composev2.xCurrentv1 syntax deprecatedBuilt into Docker CLI

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Self-hosted monitoring on VMs or bare metal with DockerRunning on Kuberneteskube-prometheus-stack Helm chart
Need full control over retention, scraping, alertingWant managed monitoring with zero opsGrafana Cloud, Datadog, or AWS CloudWatch
Dev/staging environment monitoringMonitoring 1000+ nodesThanos or Cortex for horizontal scaling
Docker Compose is already your deployment toolNeed log aggregation (not metrics)Loki + Grafana or ELK stack
Budget-conscious -- all components free and open-sourceNeed APM/tracingOpenTelemetry + Jaeger or commercial APM

Important Caveats

Related Units