Comprehensive engineering productivity benchmarks covering the five DORA metrics (deployment frequency, lead time for changes, change failure rate, mean time to recovery, rework rate) plus cycle time, PR metrics, and throughput data. Sourced from the 2025 DORA Report (~5,000 respondents) and LinearB's analysis of 8.1M+ pull requests across 4,800 teams. The most significant finding: AI coding assistants boost individual output but organizational delivery metrics remain flat. [src1]
Data vintage: Based on 2025 DORA survey data and LinearB's 2024-2025 PR analysis from 4,800+ engineering teams across 42 countries.
Key shift: DORA expanded from 4 to 5 metrics in 2025 by adding rework rate. The framework reorganized into throughput metrics (deployment frequency, lead time, recovery time) and instability metrics (change failure rate, rework rate). The traditional elite/high/medium/low classification was replaced with archetype-based clusters. [src1][src4]
Definition: Number of production deployments per unit of time per team/service. Measures how often code reaches production. Counted at the service or application level, not per developer.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| Small team (2-10) | 2-3x/week | 1x/week | 1x/day | Multiple/day |
| Mid-size (11-50) | 1-2x/week | 2x/month | 3-5x/week | 1x/day |
| Large (51-200) | 1x/week | 2x/month | 2-3x/week | Daily |
| Enterprise (200+) | 2-4x/month | 1x/month | 1x/week | 2-3x/week |
Trend: Only 16.2% of organizations achieve on-demand deployment. 23.9% deploy less than once per month. Distribution is bimodal. [src1][src3]
Red flag threshold: Deploying less than once per month indicates batch-oriented delivery with high risk per deployment.
Definition: Time from code commit to code successfully running in production. Includes code review, CI/CD pipeline execution, and any manual approval gates.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| Small team (2-10) | 1-2 days | 2-5 days | 2-6 hours | < 1 hour |
| Mid-size (11-50) | 2-5 days | 1-2 weeks | 1-2 days | < 1 day |
| Large (51-200) | 3-7 days | 1-4 weeks | 2-3 days | 1-2 days |
| Enterprise (200+) | 1-2 weeks | 1-6 months | 3-7 days | 1-3 days |
Trend: Only 9.4% of teams achieve lead times under one hour. 31.9% fall in the one-day-to-one-week range. [src1][src3]
Red flag threshold: Lead time exceeding 1 month signals severe process bottlenecks or manual gates.
Definition: Percentage of deployments that cause a failure in production requiring remediation (rollback, hotfix, patch, or emergency fix).
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| Small team (2-10) | 10% | 15-20% | 5% | < 2% |
| Mid-size (11-50) | 12% | 20-25% | 5-8% | < 3% |
| Large (51-200) | 15% | 25-30% | 8-10% | < 5% |
| Enterprise (200+) | 18% | 30%+ | 10-15% | < 5% |
Trend: Only 8.5% of teams achieve ideal CFR of 0-2%. AI-assisted code changes show higher initial failure rates. [src1][src5]
Red flag threshold: CFR above 25% indicates systemic quality issues.
Definition: Time from detection of a production failure to full service restoration. Also called "failed deployment recovery time" in the 2025 DORA framework.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| Small team (2-10) | 1-4 hours | 4-12 hours | 30-60 min | < 15 min |
| Mid-size (11-50) | 2-8 hours | 8-24 hours | 1-2 hours | < 30 min |
| Large (51-200) | 4-12 hours | 12-48 hours | 2-4 hours | < 1 hour |
| Enterprise (200+) | 12-24 hours | 24-72 hours | 4-12 hours | < 2 hours |
Trend: Elite teams achieve MTTR under 1 hour across all segments. Teams with automated rollback recover 5-10x faster. [src1][src3]
Red flag threshold: MTTR exceeding 24 hours for non-enterprise teams indicates inadequate incident response.
Definition: Percentage of deployments that are unplanned fixes or patches to correct user-facing defects from prior deployments.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| All segments | 8-12% | 15-20% | 4-6% | < 3% |
Trend: Increased AI adoption correlates with increased rework rate — AI-generated code ships faster but requires more post-deployment corrections. [src1][src4]
Red flag threshold: Rework rate above 15% means more time fixing than shipping planned work.
Definition: Total elapsed time from pull request creation to merge into main branch. Includes pickup time, review time, revision cycles, and final approval.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| Small team (2-10) | 3-4 days | 5-7 days | 1-2 days | < 26 hours |
| Mid-size (11-50) | 5-7 days | 7-14 days | 2-4 days | < 2 days |
| Large (51-200) | 7-10 days | 10-21 days | 4-6 days | < 3 days |
| Enterprise (200+) | 10-14 days | 14-30 days | 5-8 days | < 5 days |
Trend: Average cycle time is ~7 days, with PRs sitting in review for 4 of those 7 days. Code review is the single largest bottleneck. [src2]
Red flag threshold: Cycle time exceeding 14 days for non-enterprise teams signals review process breakdown.
Definition: Number of pull requests merged per developer per week. Measures individual developer output normalized across team sizes.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| All segments | 2-3 PRs/week | 1-2 PRs/week | 4-5 PRs/week | 6+ PRs/week |
Trend: Teams using AI coding assistants show 15-25% improvement in PR throughput, but with higher rework rates. [src1][src2]
Red flag threshold: Sustained throughput below 1 PR/developer/week indicates blockers or oversized PRs.
Definition: Number of code changes (additions + modifications + deletions) per pull request. The single most impactful metric for engineering velocity.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| All segments | 200-300 lines | 400-661 lines | 100-194 lines | < 100 lines |
Trend: Elite teams maintain PR sizes under 194 changes. Teams with PRs under 194 lines achieve merge frequencies 5x faster. [src2]
Red flag threshold: PRs above 500 lines correlate with 3-5x longer cycle times and higher CFR.
Definition: Time from final code review approval to merge into main branch.
| Segment | Median | 25th Pct | 75th Pct | Top Decile |
|---|---|---|---|---|
| All segments | 4-8 hours | 12-24 hours | 1-2 hours | < 2 hours |
Trend: Elite teams maintain merge times under 2 hours. Automated merge queues are the primary improvement driver. [src2]
Red flag threshold: Merge time exceeding 24 hours after approval indicates CI/CD bottlenecks.
| Rule | Formula / Threshold | Interpretation |
|---|---|---|
| DORA Throughput Score | High deployment frequency + Low lead time | Both must be strong — high frequency with long lead time indicates small, inefficient batches |
| DORA Stability Score | Low CFR + Low MTTR + Low rework rate | All three must be healthy — low CFR with high MTTR means failures are rare but catastrophic |
| Cycle Time Ratio | Review time / Total cycle time < 50% | If review exceeds 50% of cycle time, review process is the bottleneck |
| PR Size Rule | Median PR < 200 lines | Highest-leverage metric — drives cycle time, CFR, and review quality simultaneously |
| Deploy:Rework Ratio | Planned deploys / Rework deploys > 8:1 | Less than 12.5% of deployments should be unplanned fixes |
| AI Productivity Paradox | Individual output up + Team metrics flat | AI boosts individual velocity but does not automatically improve organizational throughput |
| Segment | Definition | Typical Characteristics |
|---|---|---|
| Small team (2-10 engineers) | Startup or small product team, single-service | Direct communication, minimal process overhead, trunk-based development |
| Mid-size (11-50 engineers) | Growth-stage company or business unit | Multiple squads, code ownership emerging, PR reviews required |
| Large (51-200 engineers) | Scale-up or division within enterprise | Platform teams, shared services, architecture governance |
| Enterprise (200+ engineers) | Large organization or multi-BU company | Complex CI/CD, compliance gates, change advisory boards |
| Metric | 2023 | 2024 | 2025 | Direction |
|---|---|---|---|---|
| Deployment frequency (% daily+) | 30% | 32% | 38% | Up 8pp over 2 years |
| Lead time (% under 1 day) | 35% | 38% | 41% | Up 6pp, steady |
| Change failure rate (median) | 12% | 14% | 15% | Up 3pp — AI contributing |
| MTTR (% under 1 hour) | 20% | 22% | 25% | Up 5pp — automation gains |
| Cycle time (average) | 8 days | 7.5 days | 7 days | Down 12.5% over 2 years |
| PR size (elite threshold) | 250 lines | 220 lines | 194 lines | Down 22% — smaller PRs |
Fetch when a user asks about engineering team performance benchmarks, wants to evaluate their DORA metrics against industry peers, is setting engineering KPIs or OKRs, needs to diagnose delivery bottlenecks, or is evaluating the impact of AI coding tools on team productivity.