Engineering Productivity Benchmarks (DORA + Delivery Metrics)

Type: Benchmark Data Vintage: Q4 2025 Confidence: 0.85 Sources: 6 Verified: 2026-03-10

Summary

Comprehensive engineering productivity benchmarks covering the five DORA metrics (deployment frequency, lead time for changes, change failure rate, mean time to recovery, rework rate) plus cycle time, PR metrics, and throughput data. Sourced from the 2025 DORA Report (~5,000 respondents) and LinearB's analysis of 8.1M+ pull requests across 4,800 teams. The most significant finding: AI coding assistants boost individual output but organizational delivery metrics remain flat. [src1]

Data vintage: Based on 2025 DORA survey data and LinearB's 2024-2025 PR analysis from 4,800+ engineering teams across 42 countries.

Key shift: DORA expanded from 4 to 5 metrics in 2025 by adding rework rate. The framework reorganized into throughput metrics (deployment frequency, lead time, recovery time) and instability metrics (change failure rate, rework rate). The traditional elite/high/medium/low classification was replaced with archetype-based clusters. [src1][src4]

Constraints

Metrics

Velocity

Deployment Frequency

Definition: Number of production deployments per unit of time per team/service. Measures how often code reaches production. Counted at the service or application level, not per developer.

SegmentMedian25th Pct75th PctTop Decile
Small team (2-10)2-3x/week1x/week1x/dayMultiple/day
Mid-size (11-50)1-2x/week2x/month3-5x/week1x/day
Large (51-200)1x/week2x/month2-3x/weekDaily
Enterprise (200+)2-4x/month1x/month1x/week2-3x/week

Trend: Only 16.2% of organizations achieve on-demand deployment. 23.9% deploy less than once per month. Distribution is bimodal. [src1][src3]

Red flag threshold: Deploying less than once per month indicates batch-oriented delivery with high risk per deployment.

Lead Time for Changes

Definition: Time from code commit to code successfully running in production. Includes code review, CI/CD pipeline execution, and any manual approval gates.

SegmentMedian25th Pct75th PctTop Decile
Small team (2-10)1-2 days2-5 days2-6 hours< 1 hour
Mid-size (11-50)2-5 days1-2 weeks1-2 days< 1 day
Large (51-200)3-7 days1-4 weeks2-3 days1-2 days
Enterprise (200+)1-2 weeks1-6 months3-7 days1-3 days

Trend: Only 9.4% of teams achieve lead times under one hour. 31.9% fall in the one-day-to-one-week range. [src1][src3]

Red flag threshold: Lead time exceeding 1 month signals severe process bottlenecks or manual gates.

Stability

Change Failure Rate (CFR)

Definition: Percentage of deployments that cause a failure in production requiring remediation (rollback, hotfix, patch, or emergency fix).

SegmentMedian25th Pct75th PctTop Decile
Small team (2-10)10%15-20%5%< 2%
Mid-size (11-50)12%20-25%5-8%< 3%
Large (51-200)15%25-30%8-10%< 5%
Enterprise (200+)18%30%+10-15%< 5%

Trend: Only 8.5% of teams achieve ideal CFR of 0-2%. AI-assisted code changes show higher initial failure rates. [src1][src5]

Red flag threshold: CFR above 25% indicates systemic quality issues.

Mean Time to Recovery (MTTR)

Definition: Time from detection of a production failure to full service restoration. Also called "failed deployment recovery time" in the 2025 DORA framework.

SegmentMedian25th Pct75th PctTop Decile
Small team (2-10)1-4 hours4-12 hours30-60 min< 15 min
Mid-size (11-50)2-8 hours8-24 hours1-2 hours< 30 min
Large (51-200)4-12 hours12-48 hours2-4 hours< 1 hour
Enterprise (200+)12-24 hours24-72 hours4-12 hours< 2 hours

Trend: Elite teams achieve MTTR under 1 hour across all segments. Teams with automated rollback recover 5-10x faster. [src1][src3]

Red flag threshold: MTTR exceeding 24 hours for non-enterprise teams indicates inadequate incident response.

Rework Rate (5th DORA Metric — New in 2025)

Definition: Percentage of deployments that are unplanned fixes or patches to correct user-facing defects from prior deployments.

SegmentMedian25th Pct75th PctTop Decile
All segments8-12%15-20%4-6%< 3%

Trend: Increased AI adoption correlates with increased rework rate — AI-generated code ships faster but requires more post-deployment corrections. [src1][src4]

Red flag threshold: Rework rate above 15% means more time fixing than shipping planned work.

Efficiency

Cycle Time (PR Open to Merged)

Definition: Total elapsed time from pull request creation to merge into main branch. Includes pickup time, review time, revision cycles, and final approval.

SegmentMedian25th Pct75th PctTop Decile
Small team (2-10)3-4 days5-7 days1-2 days< 26 hours
Mid-size (11-50)5-7 days7-14 days2-4 days< 2 days
Large (51-200)7-10 days10-21 days4-6 days< 3 days
Enterprise (200+)10-14 days14-30 days5-8 days< 5 days

Trend: Average cycle time is ~7 days, with PRs sitting in review for 4 of those 7 days. Code review is the single largest bottleneck. [src2]

Red flag threshold: Cycle time exceeding 14 days for non-enterprise teams signals review process breakdown.

Throughput (PRs Merged per Developer per Week)

Definition: Number of pull requests merged per developer per week. Measures individual developer output normalized across team sizes.

SegmentMedian25th Pct75th PctTop Decile
All segments2-3 PRs/week1-2 PRs/week4-5 PRs/week6+ PRs/week

Trend: Teams using AI coding assistants show 15-25% improvement in PR throughput, but with higher rework rates. [src1][src2]

Red flag threshold: Sustained throughput below 1 PR/developer/week indicates blockers or oversized PRs.

Quality

PR Size

Definition: Number of code changes (additions + modifications + deletions) per pull request. The single most impactful metric for engineering velocity.

SegmentMedian25th Pct75th PctTop Decile
All segments200-300 lines400-661 lines100-194 lines< 100 lines

Trend: Elite teams maintain PR sizes under 194 changes. Teams with PRs under 194 lines achieve merge frequencies 5x faster. [src2]

Red flag threshold: PRs above 500 lines correlate with 3-5x longer cycle times and higher CFR.

Merge Time

Definition: Time from final code review approval to merge into main branch.

SegmentMedian25th Pct75th PctTop Decile
All segments4-8 hours12-24 hours1-2 hours< 2 hours

Trend: Elite teams maintain merge times under 2 hours. Automated merge queues are the primary improvement driver. [src2]

Red flag threshold: Merge time exceeding 24 hours after approval indicates CI/CD bottlenecks.

Composite Metrics & Rules of Thumb

RuleFormula / ThresholdInterpretation
DORA Throughput ScoreHigh deployment frequency + Low lead timeBoth must be strong — high frequency with long lead time indicates small, inefficient batches
DORA Stability ScoreLow CFR + Low MTTR + Low rework rateAll three must be healthy — low CFR with high MTTR means failures are rare but catastrophic
Cycle Time RatioReview time / Total cycle time < 50%If review exceeds 50% of cycle time, review process is the bottleneck
PR Size RuleMedian PR < 200 linesHighest-leverage metric — drives cycle time, CFR, and review quality simultaneously
Deploy:Rework RatioPlanned deploys / Rework deploys > 8:1Less than 12.5% of deployments should be unplanned fixes
AI Productivity ParadoxIndividual output up + Team metrics flatAI boosts individual velocity but does not automatically improve organizational throughput

Segment Definitions

SegmentDefinitionTypical Characteristics
Small team (2-10 engineers)Startup or small product team, single-serviceDirect communication, minimal process overhead, trunk-based development
Mid-size (11-50 engineers)Growth-stage company or business unitMultiple squads, code ownership emerging, PR reviews required
Large (51-200 engineers)Scale-up or division within enterprisePlatform teams, shared services, architecture governance
Enterprise (200+ engineers)Large organization or multi-BU companyComplex CI/CD, compliance gates, change advisory boards

Common Misinterpretations

When This Matters

Fetch when a user asks about engineering team performance benchmarks, wants to evaluate their DORA metrics against industry peers, is setting engineering KPIs or OKRs, needs to diagnose delivery bottlenecks, or is evaluating the impact of AI coding tools on team productivity.

Related Units