TL;DR:

  • DORA’s four metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — remain the most validated predictors of both software delivery performance and organisational outcomes
  • Measuring them is straightforward; using them well requires understanding what they proxy and where they break down
  • The 2024 State of DevOps report added a fifth metric (reliability) and found that high performers treat DORA as a diagnostic tool, not a target

The DORA (DevOps Research and Assessment) metrics have been part of engineering culture since the 2019 Accelerate book, but adoption quality varies enormously. Many teams collect the numbers without acting on them; others optimise the metrics directly and make performance worse. Here’s how to do it right.

The Four Core Metrics

1. Deployment Frequency

What it measures: How often code is deployed to production
Elite benchmark: On-demand (multiple times per day)
High benchmark: Between once per day and once per week

Deployment frequency is a proxy for batch size and risk tolerance. Teams that deploy frequently have invested in automated testing, feature flags, and rollback capability — the deployment itself is low-risk because each change is small. Teams that deploy weekly or monthly typically have large, risky releases because accumulated changes make each deployment expensive to reverse.

“Deployment” should mean “code running in production serving users,” not “a PR merged” or “a build passing CI.”

2. Lead Time for Changes

What it measures: Time from code commit to that code running in production
Elite benchmark: Less than one hour
High benchmark: Between one day and one week

Lead time measures the friction in your delivery pipeline. Long lead times usually trace to: manual approval gates, slow CI pipelines, large PR review queues, or environment provisioning delays. Each is a specific, fixable problem.

Lead time is often confused with cycle time (time from work started to delivery). Lead time starts at commit; cycle time starts when a developer picks up a ticket. Both matter but measure different things.

3. Change Failure Rate

What it measures: Percentage of deployments that cause a production incident requiring remediation
Elite benchmark: 0–5%
High benchmark: 10–15%

Change failure rate measures deployment quality. A low change failure rate doesn’t mean nothing ever breaks — it means your testing and review process catches most issues before production. Teams with high change failure rates often have insufficient automated testing coverage, inadequate staging environments, or insufficient peer review.

4. Time to Restore Service (MTTR)

What it measures: Time from production incident detection to service restoration
Elite benchmark: Less than one hour
High benchmark: Less than one day

MTTR measures your incident response capability, not your prevention capability. High performers restore quickly because they have good observability (they know what broke), practiced runbooks (they know what to do), and deployment pipelines fast enough to push fixes in minutes.

The Fifth Metric: Reliability

The 2024 State of DevOps report introduced reliability as a formal fifth metric — specifically, whether services meet their SLOs (Service Level Objectives). Unlike the first four metrics, reliability is an outcome measure rather than a delivery throughput measure.

Elite teams now track reliability via SLO achievement (e.g., 99.9% of requests complete in under 200ms), and they treat SLO burns as the primary trigger for engineering investment decisions — more important than any individual DORA number.

Collecting the Data

Option 1: LinearB / Swarmia / Jellyfish

These purpose-built engineering analytics platforms connect to your Git provider, CI system, and incident management tool (PagerDuty, OpsGenie) and calculate all four DORA metrics automatically. Setup takes a few hours; the cost is roughly $15–25 per developer per month.

Best for teams that want metrics without building tooling.

Option 2: GitHub / GitLab Native

GitHub’s DORA metrics dashboard (available in Enterprise/Teams) calculates deployment frequency and lead time from Actions workflows, with change failure rate and MTTR requiring additional configuration. GitLab has equivalent built-in reporting at the Gold tier.

Best for teams already standardised on one of these platforms.

Option 3: Build Your Own

For full control, you need three data sources:

  • Git events (commit timestamps, PR open/merge times)
  • Deployment events (CI/CD pipeline completions in production environment)
  • Incident events (from PagerDuty, OpsGenie, or your on-call tool)
# Example: calculate lead time from GitHub data
import requests
from datetime import datetime

def get_lead_times(repo: str, token: str, days: int = 30):
    headers = {"Authorization": f"Bearer {token}"}
    
    # Get merged PRs
    prs = requests.get(
        f"https://api.github.com/repos/{repo}/pulls",
        params={"state": "closed", "per_page": 100},
        headers=headers
    ).json()
    
    lead_times = []
    for pr in prs:
        if not pr.get("merged_at"):
            continue
        
        # First commit time (approximate: PR created time)
        first_commit = datetime.fromisoformat(pr["created_at"].replace("Z", "+00:00"))
        merged = datetime.fromisoformat(pr["merged_at"].replace("Z", "+00:00"))
        
        # Lead time = time from PR opened to merged
        # Better: use actual first commit timestamp from commits API
        lead_times.append((merged - first_commit).total_seconds() / 3600)
    
    return sum(lead_times) / len(lead_times) if lead_times else 0

avg_hours = get_lead_times("myorg/myrepo", "ghp_...")
print(f"Average lead time: {avg_hours:.1f} hours")

Best for teams with specific measurement requirements or wanting to integrate metrics into existing dashboards.

Common Pitfalls

Gaming the metrics

Deployment frequency is trivially gameable: deploy empty commits, or split changes artificially. Change failure rate can be deflected by not classifying incidents as deployment-related. If you tie individual performance reviews to DORA numbers, you’ll get number optimisation, not performance improvement.

Fix: Use DORA at the team and organisation level, not for individual evaluation. Review the underlying data periodically to check for anomalies.

Measuring the wrong deployment event

If your CI pipeline deploys to staging automatically but production requires a manual gate, and you count staging deploys as “deployments,” your frequency looks great but measures nothing useful.

Fix: Define “deployment” as “code serving production user traffic” and instrument accordingly.

Ignoring context

A team deploying infrastructure code once per quarter may be doing excellent work — quarterly cycles are appropriate for their domain. A team running a consumer SaaS product should deploy multiple times daily.

Fix: Set benchmarks in context of system type and risk profile, not absolute DORA tiers.

Treating DORA as the goal

Teams that hit elite benchmarks on all four metrics haven’t necessarily solved their core problems — they’ve optimised a set of proxies. The goal is software that reliably delivers user value. DORA metrics help identify bottlenecks; they don’t define success.

How to Act on DORA Data

  1. Find your constraint. If lead time is high, look at your pipeline stages — is it PR review queue depth, CI runtime, or deployment approval gates? Fix the bottleneck, not the metric.

  2. Track trends, not snapshots. A single week’s data is noise. A three-month trend showing lead time increasing by 20% is a signal worth investigating.

  3. Pair with team health data. The DORA research consistently shows that psychological safety and clear communication are as predictive of high performance as technical practices. Metrics without culture change produce marginal gains.

  4. Set improvement targets, not benchmarks. “Improve deployment frequency from weekly to daily over Q3” is a useful goal. “Reach elite tier by EOY” is a target that will be hit by gaming rather than genuine improvement.

What the Research Actually Says

The State of DevOps research (now 10+ years of longitudinal data) shows that elite DORA performers have:

  • 127× more frequent deployments than low performers
  • 6× lower change failure rates
  • 2,604× faster time to recover from incidents
  • Significantly higher levels of developer satisfaction and lower burnout rates

The causality runs in both directions: good engineering practices enable high DORA performance, and the culture that produces good practices also produces engineer wellbeing. The metrics are an output of good engineering culture, not a substitute for it.