#010: Your DORA Metrics Are Lying to You: The AI ROI Crisis Tearing Through Engineering Teams (And What to Measure Instead)

Mar 27, 2026

Hi friends, this is Edo with the 10th issue of the Full-Stack AI Engineer Newsletter

TLDR: Half of all developers now use AI coding tools daily, but most engineering teams have zero visibility into whether these tools actually improve outcomes. Traditional DORA metrics weren’t designed for AI-augmented workflows and are increasingly misleading. A new generation of measurement frameworks—led by the emerging “DX Core 4”—is stepping in to fill the gap, and engineering leaders need to pay attention now.

The Uncomfortable Truth About Your AI Investment

Your team adopted Copilot. Maybe Cursor. Maybe something newer. Deployment frequency went up. Lead time for changes dropped. Your DORA dashboard looks great.

But here’s the question nobody wants to ask: Is your team actually more productive, or are you just shipping more code faster with less understanding of what it does?

According to Panto AI’s 2026 statistics, 51% of professional developers now use AI coding tools daily. That’s not an experiment anymore—it’s the default. Yet as Waydev reports, most engineering teams “have no idea if they’re actually improving productivity.” They’re spending real money on AI tooling licenses, watching activity metrics climb, and hoping the investment pays off.

Hope is not a measurement strategy. And in March 2026, the reckoning arrived.

Why DORA Metrics Are Cracking Under Pressure

DORA metrics—deployment frequency, lead time for changes, change failure rate, and mean time to recovery—have been the gold standard for engineering performance since Google’s State of DevOps reports popularized them. They worked well for a world where humans wrote all the code.

That world no longer exists.

GetDX’s March 2026 analysis makes the case plainly: DORA metrics alone are insufficient for AI-assisted teams. The reasons are straightforward once you see them:

Deployment frequency inflates artificially. When AI generates boilerplate, scaffolding, and even feature code, teams ship more often. But frequency without quality is just noise.
Lead time shrinks in misleading ways. AI can produce a pull request in seconds. But the review overhead for AI-generated code is often higher than for human-written code, because reviewers need to verify logic they didn’t write or reason about.
Change failure rate hides new failure modes. AI-generated code can introduce subtle bugs that pass CI but fail in production in unexpected ways. Traditional change failure rate doesn’t capture this nuance.
None of these metrics measure developer experience. A team can have stellar DORA numbers while developers feel exhausted from reviewing AI-generated PRs they don’t trust.

The core problem is this: DORA measures delivery throughput and stability. It was never designed to measure whether AI tools are making your engineers more effective, happier, or producing better software. Treating it as an AI ROI metric is like measuring a restaurant’s success by how fast food leaves the kitchen, ignoring whether anyone enjoys eating it.

The Industry Is Scrambling for Answers

This isn’t just an academic debate. It’s becoming a board-level concern.

Multiple vendors—Axify, Milestone AI, Waydev—are racing to build GenAI tool ROI tracking directly into their engineering intelligence platforms. When three or more companies independently build the same feature at the same time, it tells you something: their customers are demanding it.

Harness released a new framework for evaluating AI coding tools across three dimensions: velocity, quality, and ROI. This signals a maturation in the conversation. We’re moving past “AI is amazing” and into “prove it.”

And honestly? That’s healthy. Every technology goes through this cycle. The hype phase is over. The accountability phase has begun.

Enter the DX Core 4: A Post-DORA Framework

The most promising successor to DORA for AI-augmented teams is the emerging DX Core 4 framework, which incorporates developer experience signals alongside traditional delivery metrics.

The idea is simple but powerful: you can’t understand engineering productivity by looking only at outputs. You need to understand the experience of the people producing those outputs. Here’s what the DX Core 4 approach adds to the conversation:

Speed — Yes, still measure delivery velocity. But contextualize it. A PR that took 10 minutes to generate and 3 hours to review isn’t a speed win.
Effectiveness — Are developers spending time on high-value work, or are they babysitting AI output? Track the ratio of creative problem-solving to AI-generated code review and correction.
Quality — Go beyond change failure rate. Measure defect density in AI-generated vs. human-written code. Track production incidents traced back to AI-assisted commits. Milestone AI’s benchmarking data suggests high-performing teams are already segmenting quality metrics this way.
Impact — Connect engineering output to business outcomes. Did the AI-assisted feature actually move a product metric? Did faster delivery translate to faster customer value?

The DX Core 4 doesn’t throw DORA away. It wraps DORA in context. It asks: “Your numbers look good, but what’s actually happening?”

What You Should Do Right Now

If you’re an engineering leader, tech lead, or even a senior IC who cares about how your team works, here’s where to start:

1. Segment your metrics by AI involvement. Start tagging PRs, commits, or tasks that were AI-assisted. You can’t measure what you don’t track. Even a rough tag—”AI-generated,” “AI-assisted,” “human-only”—gives you a baseline.

2. Measure review overhead explicitly. Track time-to-review and review rounds for AI-generated code separately. If your reviewers are spending 2x longer on AI PRs, that’s a hidden cost eating your supposed productivity gains.

3. Survey your developers. Regularly. Developer experience isn’t a soft metric. It’s a leading indicator. If your engineers feel like they’re spending more time fixing AI mistakes than building features, your DORA dashboard is lying to you.

4. Connect to business outcomes. The ultimate ROI question isn’t “did we ship faster?” It’s “did we deliver more value?” Work with product to tie engineering throughput to actual customer or revenue impact.

5. Stop optimizing for a single framework. DORA, DX Core 4, Harness’s velocity-quality-ROI model—these are lenses, not scorecards. Use multiple perspectives. The teams that thrive will be the ones that resist reducing productivity to a single number.

The Bigger Picture

As some observers have noted, we may be in the early innings of an AI correction—not in the technology itself, but in our expectations and measurement of it. AI coding tools are genuinely useful. But “useful” and “worth what we’re paying” are different claims, and the second one requires evidence.

The teams that will win this transition aren’t the ones adopting AI the fastest. They’re the ones measuring AI’s impact the most honestly.

Key Takeaways

DORA metrics inflate under AI assistance — deployment frequency and lead time improve on paper while hiding increased review overhead and new categories of defects. Supplement DORA with quality and experience metrics.
Segment your engineering metrics by AI involvement — tag AI-generated and AI-assisted work separately so you can compare quality, review time, and defect rates against human-only baselines.
Adopt a multi-dimensional framework like DX Core 4 — measure speed, effectiveness, quality, and business impact together rather than relying on any single set of delivery metrics.
Treat developer experience as a leading indicator — run regular surveys to catch hidden costs like AI review fatigue before they show up in attrition or quality problems.
Connect AI tool usage to business outcomes — the real ROI question isn’t “did we ship more code?” but “did we deliver more customer value?” Work with product teams to close that loop.

The Full-Stack AI Engineer

Discussion about this post

Ready for more?