CI/CD pipeline design

- Bottom line: A well-designed CI/CD pipeline connects source control triggers to isolated runners, executes build/test/security stages with fast feedback, produces signed immutable artifacts, and promotes them through environments with explicit gates -- targeting < 10 min for the fast-feedback loop and < 1 day total lead time.

continuous integration continuous deployment architecture

- Bottom line: A well-designed CI/CD pipeline connects source control triggers to isolated runners, executes build/test/security stages with fast feedback, produces signed immutable artifacts, and promotes them through environments with explicit gates -- targeting < 10 min for the fast-feedback loop and < 1 day total lead time.

CI/CD pipeline best practices

- Bottom line: A well-designed CI/CD pipeline connects source control triggers to isolated runners, executes build/test/security stages with fast feedback, produces signed immutable artifacts, and promotes them through environments with explicit gates -- targeting < 10 min for the fast-feedback loop and < 1 day total lead time.

CI/CD pipeline components

- Bottom line: A well-designed CI/CD pipeline connects source control triggers to isolated runners, executes build/test/security stages with fast feedback, produces signed immutable artifacts, and promotes them through environments with explicit gates -- targeting < 10 min for the fast-feedback loop and < 1 day total lead time.

How to Design a CI/CD Pipeline Architecture

How do I design a CI/CD pipeline architecture?

TL;DR

Bottom line: A well-designed CI/CD pipeline connects source control triggers to isolated runners, executes build/test/security stages with fast feedback, produces signed immutable artifacts, and promotes them through environments with explicit gates -- targeting < 10 min for the fast-feedback loop and < 1 day total lead time.
Key tool/command: .github/workflows/ci.yml (GitHub Actions) or .gitlab-ci.yml (GitLab CI) -- declarative YAML defining stages, jobs, and deployment gates.
Watch out for: Building artifacts multiple times (once per environment) instead of building once and promoting -- this is the #1 source of "works in staging, fails in prod" bugs.
Works with: GitHub Actions, GitLab CI, Jenkins 2.x+, CircleCI, Azure DevOps, AWS CodePipeline, Tekton. Concepts are platform-agnostic.

Constraints

Never store secrets in pipeline YAML or version control -- use platform secret stores with short-lived credentials
Build artifacts exactly once and promote the same binary/image through all environments
Fast-feedback stages (lint, unit tests, SAST) must complete in under 10 minutes
Never deploy to production without at least one automated or manual gate
Pin all CI tool versions, runner images, and action references to SHA or exact version tags
Quarantine flaky tests immediately -- >2% flake rate erodes CI trust and leads to ignored failures

Quick Reference

Component	Role	Technology Options	Scaling Strategy
Source Control Trigger	Detects code changes, initiates pipeline	GitHub webhooks, GitLab push events, Jenkins SCM polling	Event-driven (no polling); branch filters to limit triggers
Pipeline Orchestrator	Defines stages, manages job dependencies, gates	GitHub Actions, GitLab CI, Jenkins Pipeline, CircleCI, Tekton	Declarative YAML; parallel job execution; DAG-based scheduling
Build Runner / Executor	Executes pipeline jobs in isolated environments	GitHub-hosted runners, GitLab shared runners, Jenkins agents, self-hosted runners	Auto-scaling runner pools; ephemeral containers; spot/preemptible instances
Artifact Registry	Stores immutable build outputs (images, packages)	Docker Hub, GitHub Packages, GitLab Container Registry, AWS ECR, Artifactory	Content-addressable storage; geo-replicated registries; retention policies
Test Framework	Validates correctness at unit, integration, E2E levels	Jest, pytest, JUnit, Cypress, Playwright	Parallel test sharding; intelligent test selection; flaky test quarantine
Security Scanner (SAST/DAST)	Shift-left vulnerability detection	Snyk, Semgrep, Trivy, CodeQL, SonarQube	Run SAST in parallel with unit tests; DAST on staging only
Secret Manager	Provides credentials to pipeline without exposure	GitHub Secrets, GitLab CI Variables, HashiCorp Vault, AWS Secrets Manager	OIDC federation for cloud providers; short-lived tokens; no static keys
Deployment Controller	Manages rollout strategy to target environments	Kubernetes (kubectl/Helm), ArgoCD, AWS CodeDeploy, Terraform	Canary/blue-green via progressive delivery; automated rollback on metric degradation
Artifact Signer / SBOM	Supply chain integrity and provenance	Sigstore/Cosign, SLSA provenance, Syft (SBOM), in-toto attestations	Keyless signing via OIDC; SLSA Level 3 with isolated builders
Notification / Observability	Pipeline status feedback and metrics	Slack/Teams webhooks, Datadog CI Visibility, Grafana, DORA dashboards	Track 4 DORA metrics; alert on failure rate spikes; pipeline duration trends
Environment Manager	Manages staging, preview, production targets	Kubernetes namespaces, Terraform workspaces, Vercel/Netlify preview deploys	Ephemeral preview environments per PR; auto-cleanup on merge
Cache Layer	Speeds up repeated builds by reusing dependencies	GitHub Actions cache, GitLab CI cache, S3-backed caches, Turborepo	Cache by lockfile hash; layer caching for Docker; distributed remote caches

Decision Tree

Platform Selection

START: Choose your CI/CD platform
|
+-- Already using GitHub for source control?
|   +-- YES --> GitHub Actions (native integration, largest marketplace)
|   +-- NO  |
|
+-- Need built-in container registry + security scanning + GitOps?
|   +-- YES --> GitLab CI (all-in-one DevOps platform)
|   +-- NO  |
|
+-- Require maximum customization + plugin ecosystem?
|   +-- YES --> Is your team willing to manage infrastructure?
|   |   +-- YES --> Jenkins (self-hosted, 1800+ plugins)
|   |   +-- NO  --> CircleCI or GitHub Actions (managed)
|   +-- NO  |
|
+-- Enterprise with Azure/Microsoft ecosystem?
|   +-- YES --> Azure DevOps Pipelines
|   +-- NO  |
|
+-- AWS-native infrastructure with CodeCommit/CodeBuild?
|   +-- YES --> AWS CodePipeline + CodeBuild + CodeDeploy
|   +-- NO  |
|
+-- Kubernetes-native, want pipelines-as-code in K8s?
|   +-- YES --> Tekton Pipelines
|   +-- NO  --> GitHub Actions (safest default for most teams)

Scaling Decision

+-- Team size < 5, deploys weekly?
|   +-- Single workflow file, manual deploy gate
|
+-- Team 5-20, deploys daily?
|   +-- Multi-stage pipeline, automated staging deploy, manual prod approval
|
+-- Team 20+, multiple deploys/day?
|   +-- Monorepo: path-based triggers + parallel jobs
|   +-- Microservices: per-service pipelines + shared reusable workflows
|
+-- Enterprise 100+, continuous deployment?
|   +-- GitOps (ArgoCD/Flux) + progressive delivery (Argo Rollouts/Flagger)
|   +-- DORA metrics dashboard + deployment frequency tracking

Step-by-Step Guide

1. Define pipeline stages and fast-feedback loop

Structure your pipeline into discrete stages that run in order, with fast checks first. The goal is to catch 80% of issues in the first 5 minutes. [src1] [src2]

# Canonical stage ordering (platform-agnostic concept)
stages:
  - lint          # < 1 min: code style, formatting
  - security      # < 2 min: SAST, secret scanning (parallel with lint)
  - build         # < 3 min: compile, bundle, create artifact
  - unit-test     # < 5 min: fast unit tests (parallel shards)
  - integration   # < 10 min: API tests, DB tests
  - staging       # deploy to staging, run smoke tests
  - approval      # manual gate or automated canary check
  - production    # deploy to production
  - post-deploy   # smoke tests, DAST, notification

Verify: Pipeline stages are sequential; jobs within a stage can run in parallel. Lint + security should complete before build starts.

2. Configure source control triggers

Set up event-based triggers that only run relevant pipeline stages. Avoid running full pipelines on every push to every branch. [src1]

# GitHub Actions trigger configuration
on:
  pull_request:
    branches: [main, develop]
    paths-ignore:
      - '**.md'
      - 'docs/**'
  push:
    branches: [main]
  release:
    types: [published]

Verify: Push to a documentation-only branch should NOT trigger the pipeline. Push to main should trigger full pipeline.

3. Implement build-once, promote-everywhere

Build your artifact exactly once, tag it with the commit SHA, and promote that exact artifact through environments. [src7]

# Build and tag with commit SHA
docker build -t myapp:${GITHUB_SHA} .
docker tag myapp:${GITHUB_SHA} registry.example.com/myapp:${GITHUB_SHA}
docker push registry.example.com/myapp:${GITHUB_SHA}

# In staging: deploy the exact SHA
kubectl set image deployment/myapp app=registry.example.com/myapp:${GITHUB_SHA}

# In production: promote the SAME image (no rebuild)
kubectl set image deployment/myapp app=registry.example.com/myapp:${GITHUB_SHA}

Verify: docker inspect registry.example.com/myapp:${SHA} returns identical image ID in both staging and production.

4. Add automated quality gates

Each environment promotion requires passing a quality gate. Combine automated checks with optional manual approval for production. [src4]

# GitHub Actions: require status checks before merge
# Settings > Branches > Branch protection rules > Require status checks:
#   - lint
#   - test
#   - security-scan
# Settings > Environments > production > Required reviewers: 1

Verify: A PR with failing tests cannot be merged. Production deployment requires explicit approval.

5. Integrate security scanning (shift-left)

Add SAST, dependency scanning, and secret detection as parallel stages that run alongside unit tests. [src6]

# Run security checks in parallel with tests
security:
  parallel:
    - sast: semgrep --config=auto .
    - deps: trivy fs --severity HIGH,CRITICAL .
    - secrets: gitleaks detect --source .
    - sbom: syft . -o spdx-json > sbom.json

Verify: trivy fs . returns exit code 0 (no HIGH/CRITICAL vulnerabilities). gitleaks detect returns exit code 0 (no leaked secrets).

6. Set up DORA metrics tracking

Track the four DORA metrics to measure pipeline health. Elite teams target: deployment frequency multiple times/day, lead time < 1 day, change failure rate < 15%, recovery time < 1 hour. [src4]

# Key metrics to instrument:
# 1. Deployment Frequency: count deployments per day/week
# 2. Lead Time for Changes: time from commit to production deploy
# 3. Change Failure Rate: % of deployments causing incidents
# 4. Mean Time to Recovery: time from incident to resolution

Verify: Dashboard shows deployment frequency trend. Lead time from commit to production is tracked.

Code Examples

GitHub Actions: Complete CI/CD Pipeline

# .github/workflows/ci-cd.yml
# Input:  Push to main or PR to main
# Output: Tested, scanned, built artifact deployed to staging/production

name: CI/CD Pipeline
on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  lint:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]
      - uses: actions/[email protected]
        with: { node-version: '22' }
      - run: npm ci
      - run: npm run lint

  test:
    runs-on: ubuntu-24.04
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - uses: actions/[email protected]
      - uses: actions/[email protected]
        with: { node-version: '22' }
      - run: npm ci
      - run: npm test -- --shard=${{ matrix.shard }}/4

  security:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/[email protected]
      - uses: github/codeql-action/init@v3
        with: { languages: javascript }
      - uses: github/codeql-action/analyze@v3

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-24.04
    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/[email protected]
      - uses: docker/[email protected]
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - id: meta
        uses: docker/[email protected]
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: type=sha,prefix=
      - uses: docker/[email protected]
        with:
          push: true
          tags: ${{ steps.meta.outputs.tags }}

  deploy-staging:
    if: github.ref == 'refs/heads/main'
    needs: [build]
    runs-on: ubuntu-24.04
    environment: staging
    steps:
      - run: |
          kubectl set image deployment/myapp \
            app=${{ needs.build.outputs.image-tag }}

  deploy-production:
    needs: [deploy-staging]
    runs-on: ubuntu-24.04
    environment:
      name: production
      url: https://myapp.example.com
    steps:
      - run: |
          kubectl set image deployment/myapp \
            app=${{ needs.build.outputs.image-tag }}

GitLab CI: Complete CI/CD Pipeline

# .gitlab-ci.yml
# Input:  Merge request or push to main
# Output: Tested, scanned, built artifact deployed through environments

stages:
  - validate
  - build
  - test
  - staging
  - production

variables:
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

lint:
  stage: validate
  image: node:22-alpine
  script:
    - npm ci --cache .npm
    - npm run lint
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths: [.npm]

sast:
  stage: validate
  include:
    - template: Security/SAST.gitlab-ci.yml

build:
  stage: build
  image: docker:27
  services:
    - docker:27-dind
  script:
    - docker build -t $IMAGE_TAG .
    - docker push $IMAGE_TAG
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

unit-test:
  stage: test
  image: node:22-alpine
  parallel: 4
  script:
    - npm ci
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
  artifacts:
    reports:
      junit: junit.xml

deploy-staging:
  stage: staging
  environment:
    name: staging
    url: https://staging.myapp.example.com
  script:
    - kubectl set image deployment/myapp app=$IMAGE_TAG
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy-production:
  stage: production
  environment:
    name: production
    url: https://myapp.example.com
  script:
    - kubectl set image deployment/myapp app=$IMAGE_TAG
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  needs:
    - deploy-staging

Anti-Patterns

Wrong: Rebuilding artifacts per environment

# BAD -- rebuilds for each environment; staging and production binaries may differ
deploy-staging:
  script:
    - docker build -t myapp:staging .  # build #1
    - docker push myapp:staging

deploy-production:
  script:
    - docker build -t myapp:production .  # build #2 -- NOT the same binary!
    - docker push myapp:production

Correct: Build once, promote the artifact

# GOOD -- single build, same image promoted through environments
build:
  script:
    - docker build -t myapp:${CI_COMMIT_SHA} .  # build once
    - docker push myapp:${CI_COMMIT_SHA}

deploy-staging:
  script:
    - kubectl set image deployment/myapp app=myapp:${CI_COMMIT_SHA}  # same image

deploy-production:
  script:
    - kubectl set image deployment/myapp app=myapp:${CI_COMMIT_SHA}  # same image

Wrong: Storing secrets in pipeline YAML

# BAD -- secrets in plaintext, committed to version control
env:
  DATABASE_URL: "postgres://admin:[email protected]:5432/prod"
  AWS_SECRET_ACCESS_KEY: "AKIA..."

Correct: Using platform secret stores with OIDC

# GOOD -- secrets injected at runtime, never in source control
jobs:
  deploy:
    permissions:
      id-token: write  # enables OIDC
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/deploy
          aws-region: us-east-1
      # No static credentials -- uses short-lived OIDC token

Wrong: Monolithic pipeline that runs everything sequentially

# BAD -- 45-minute sequential pipeline; lint failure blocks everything
steps:
  - run: npm run lint
  - run: npm test
  - run: npm run e2e
  - run: docker build .
  - run: trivy image myapp
  - run: kubectl apply -f k8s/
  # Total: 45 minutes, sequential, no parallelism

Correct: Parallel stages with fast-feedback first

# GOOD -- parallel execution, fast feedback in < 5 minutes
jobs:
  lint:           # 1 min, runs immediately
    ...
  security-scan:  # 2 min, runs in parallel with lint
    ...
  unit-test:      # 4 min, runs in parallel with lint
    ...
  build:          # 3 min, waits for lint + test + security
    needs: [lint, unit-test, security-scan]
  e2e-test:       # 10 min, waits for build
    needs: [build]
  deploy:         # 2 min, waits for e2e
    needs: [e2e-test]
  # Total: ~20 min with parallelism (vs 45 min sequential)

Wrong: Using floating version tags for CI tools

# BAD -- @v3 could change without warning, breaking your pipeline
- uses: actions/checkout@v3      # could be v3.0.0 today, v3.9.9 tomorrow
- uses: docker/build-push-action@latest  # completely unpinned

Correct: Pinning to exact SHA or version

# GOOD -- deterministic, reproducible builds
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11  # v4.1.1
- uses: docker/[email protected]  # exact version

Common Pitfalls

No caching strategy: Every build downloads dependencies from scratch, adding 2-5 minutes per run. Fix: cache node_modules, .m2, pip cache keyed by lockfile hash. [src1]
Flaky tests ignored: Tests intermittently fail but nobody fixes them, eroding CI trust until developers routinely re-run pipelines. Fix: quarantine flaky tests immediately; track flake rate as a team metric. [src7]
No branch protection: Developers push directly to main, bypassing CI entirely. Fix: require status checks (lint, test, security) to pass before merge. [src1]
Secrets in logs: Pipeline logs expose environment variables or command output containing credentials. Fix: mark secrets as masked in CI settings; never echo $SECRET; audit log output. [src3]
No rollback plan: Production deployment fails with no automated way to revert. Fix: keep the previous artifact tag; automate rollback on health check failure (kubectl rollout undo). [src4]
Over-triggering: Every push to every branch triggers the full pipeline, wasting runner minutes. Fix: use path filters, branch filters, and [skip ci] conventions. [src2]
Shared mutable state in tests: Integration tests share a database or filesystem, causing ordering-dependent failures. Fix: use per-test database schemas or containers; clean state before each test. [src7]
No pipeline observability: Teams have no visibility into pipeline duration trends, failure rates, or bottlenecks. Fix: track the 4 DORA metrics; set up CI dashboards (Datadog CI Visibility, GitHub Actions insights). [src4]

Diagnostic Commands

# Check GitHub Actions workflow syntax
gh workflow list
gh run list --limit 5

# View recent pipeline runs and their status
gh run view <run-id> --log-failed

# Verify Docker image exists in registry
docker manifest inspect registry.example.com/myapp:${SHA}

# Check Kubernetes deployment rollout status
kubectl rollout status deployment/myapp --timeout=300s

# Measure DORA: deployment frequency (last 30 days)
gh api repos/{owner}/{repo}/deployments --paginate | jq '[.[] | select(.created_at > "2026-01-23")] | length'

# Check for secrets accidentally committed
gitleaks detect --source . --verbose

Version History & Compatibility

Platform	Current Version	Key Feature	Notes
GitHub Actions	v2 (2024+)	Reusable workflows, OIDC, larger runners	Largest marketplace; 2000+ free minutes/month (public repos unlimited)
GitLab CI	17.x (2025)	CI Components catalog, SLSA provenance	Built-in container registry, SAST, DAST; all-in-one platform
Jenkins	2.479+ (2025)	Declarative Pipeline, Pipeline as Code	Requires self-hosting; 1800+ plugins; highest customization
CircleCI	Cloud (2025)	Orbs, intelligent test splitting, Docker layer caching	Managed; strong Docker support; credit-based pricing
Azure DevOps	2025	YAML pipelines, template expressions	Deep Azure/Microsoft integration; hybrid self-hosted agents
Tekton	0.60+ (2025)	Kubernetes-native, CRD-based pipelines	Cloud-native; steep learning curve; ideal for K8s-first teams

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Building any software project with >1 developer	Solo developer with manual deploys to a single server	Simple shell script or manual deploy
You need reproducible, auditable builds	Prototyping or hackathon with no production target	Direct git push to hosting (Vercel, Netlify auto-deploy)
Compliance requires build provenance (SOC2, SLSA)	The project has no tests and no build step	Add tests first, then add CI/CD
Team targets DORA elite metrics (daily deploys, <1h recovery)	Deploying static files with no build process	Static site hosts with git-triggered deploys
Microservices with independent release cycles	Tightly coupled monolith deploying everything together	Single pipeline with all-or-nothing deploy
Multiple environments (dev, staging, production)	Single environment with no promotion path	Direct deployment script

Important Caveats

Pipeline YAML syntax and features vary significantly between platforms -- a GitHub Actions workflow is not portable to GitLab CI without rewriting. The architecture concepts (stages, gates, artifact promotion) are portable; the implementation is not.
Self-hosted runners (Jenkins agents, GitHub self-hosted runners) require patching, monitoring, and security hardening -- they become attack vectors if compromised, as they have access to secrets and deployment credentials.
DORA metrics are team-level indicators, not individual developer metrics. Using deployment frequency to evaluate individual performance leads to gaming rather than genuine improvement.
"CI/CD" is often used loosely to mean just CI (automated testing). True CD (continuous deployment to production) requires significant investment in automated testing, monitoring, and rollback capabilities. Most teams practice CI + continuous delivery (manual production gate), not continuous deployment.
Cost can escalate quickly with managed CI/CD: GitHub Actions charges per-minute for private repos, CircleCI uses credits, and GitLab CI shared runners have quotas. Self-hosted runners trade money for operational overhead.