TL;DR:
- Stop measuring lines of code, story points, and ticket close rates — all are trivially gameable without producing better software
- The DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are research-validated and correlate with actual business outcomes
- Measure at the team level, not the individual level — individual metrics destroy trust and get gamed within days
Most developer productivity metrics in widespread use measure the wrong things. Commits per week. Tickets closed. Story points shipped. These are easy to collect, easy to dashboard, and almost entirely useless for understanding whether your engineering organisation is actually getting better.
Why Common Metrics Fail
The reason is simple: they’re all gameable without producing better software. A team that writes more code isn’t necessarily more productive — it might be a team that hasn’t discovered abstractions or existing libraries. A team closing more tickets might just be breaking large work into smaller ones to inflate the count.
This is Goodhart’s Law: when a metric becomes a target, it stops being a good metric. It applies to engineering measurement with unusual force because engineers are smart enough to optimise for whatever number they’re being watched on within days of learning it’s a target.
Metrics to Stop Using
Lines of code has been discredited since the 1980s. It rewards verbosity over clarity and copy-paste over abstraction. The most productive thing an engineer can do on many days is delete code — and none of that shows up as positive in a line count.
Story points per sprint were designed to estimate capacity and communicate uncertainty — never to measure productivity. Once they become a performance target, estimates creep upward and velocity stabilises at whatever level the team aims for.
Tickets closed per week creates the same dynamic. If closing tickets is the target, create more smaller tickets. A docs typo fix and a refactor of the authentication system both count as one.
Hours logged measures presence, not contribution. Engineers who backfill time logs at end-of-week are estimating what they think they did.
The DORA Metrics: Research-Backed Baseline
The DORA (DevOps Research and Assessment) metrics come from a multi-year research programme running since 2014, covering tens of thousands of organisations worldwide — including many UK engineering teams. They’re the closest thing the industry has to validated developer productivity measurement: these four metrics statistically predict not just delivery speed but business outcomes like profitability and market share.
Deployment Frequency — frequent deployments mean smaller batches, lower risk per deployment, and faster feedback:
- Elite: multiple deployments per day
- High: once per day to once per week
- Low: less than once per month
Lead Time for Changes — time from a code commit to reaching production. Long lead times mean slow feedback and larger change batches that are harder to debug:
- Elite: less than one hour
- High: one day to one week
- Low: more than one month
Change Failure Rate — the percentage of deployments requiring remediation. Without this metric, deployment frequency is meaningless: a team deploying 20 times a day with a 50% failure rate is making things worse faster:
- Elite: 0–5% of deployments
- High: 5–10%
- Low: 46–60%
Time to Restore Service (MTTR) — how long it takes to recover from a production incident:
- Elite: less than one hour
- High: less than one day
- Low: more than one week
Team-Level Leading Indicators
The DORA metrics are delivery-focused. These leading indicators give you early warning before delivery degrades:
PR Cycle Time — target under 24 hours for most PRs. Long cycle times are almost always explained by three things: the PR is too large, the review culture is slow, or CI is blocking reviews. All three are fixable.
CI Pipeline Duration — target under 10 minutes. Over 10 minutes, engineers stop waiting and context-switch. Track the p95, not the average — averages hide the worst cases that are most disruptive.
Incident Frequency and MTTR — target fewer than 2 production incidents per week; MTTR under 1 hour.
Unplanned Work Ratio — target under 20% of sprint capacity absorbed by reactive work. High ratios indicate reliability problems or accumulating technical debt.
Presenting Metrics to Leadership
Translate each DORA metric to a business outcome non-technical stakeholders can act on:
- Deployment frequency → “We deployed 42 times last month — smaller batches and faster delivery”
- Lead time → “Median time from decision to delivery is 3 days; targeting 1 day by Q4”
- Change failure rate → “5% of Q1 deployments required a rollback; targeting under 3%”
- MTTR → “Median fix time is 47 minutes; goal is under 30”
This framing works well in UK engineering organisations where leadership is used to OKRs and quarterly reporting — it maps naturally onto that structure.
Measurement Without Surveillance
The fastest way to poison a metrics programme is to attach individual-level data to performance reviews. Once engineers know their personal commit count is tracked, they optimise for the number rather than the outcome.
A few principles that actually work: aggregate at the team level (“the team’s PR cycle time is 18 hours” is a systems observation; “Alex’s PR cycle time is 36 hours” is a performance judgement). Make dashboards visible to engineers, not just managers. Never tie metrics to performance reviews. And start with one metric — pick PR cycle time or deployment frequency, instrument it properly, let the team react for a quarter, then add more.
The Bottom Line
Stop measuring inputs and activity. DORA metrics are research-validated, correlate with business outcomes, and are achievable for most teams with a few hours of setup. PR cycle time and CI pipeline duration give you early warning before delivery degrades. The hard part isn’t choosing metrics — it’s creating the organisational conditions where metrics serve the team rather than being used against it.