AI Strategy · Future of Work · Team Design

The Productivity Gap Is Real. Most Companies Are Measuring the Wrong Thing.

Devavrat Mahajan April 2026 7 min read

A research group called METR, which evaluates advanced AI systems, published a finding in 2025 that should have changed how every enterprise measures AI adoption. It has not.

They studied experienced software developers working with and without AI tools. The developers using AI tools believed they were completing tasks 20 percent faster. The actual data showed they were completing them 19 percent slower.

The gap between what they thought was happening and what was actually happening is the gap most enterprise AI programmes are living in right now. Tool adoption rates are up. Usage metrics look positive. And in far too many organisations, output has not moved in proportion to the investment.

This is not a technology problem. The tools work. The problem is the difference between access and fluency, and most organisations are measuring access while the gap they actually need to close is fluency.

What the Data Shows When You Look at Output, Not Adoption

CircleCI's 2026 State of Software Delivery report analysed 28.7 million CI/CD workflow runs across thousands of engineering teams. The finding deserves more attention than it has received.

97%

Throughput increase for the top 5 percent of teams

Median team output increase over the same period

Measurable improvement for the bottom quartile

The top 5 percent of teams nearly doubled their throughput, a 97 percent increase in output over the period. The median team increased by 4 percent. The bottom quartile recorded no measurable improvement.

Same period. Same AI tools available to every team. A 97 percent gain at the top and effectively zero at the bottom.

The finding is that AI tools produced a 97 percent output gain in teams that had genuinely integrated them into how they work, and almost nothing in teams that had made them available without building real fluency. Availability is not integration. Tool access is not workflow transformation.

Meta's data confirms this from a different direction. Output per engineer rose 30 percent on average across 2025, driven by AI coding tools. But among the engineers who used these tools most intensively, the ones Zuckerberg described as power users, output increased by 80 percent year on year. The gap between the average adopter and the deep integrator was not a small efficiency difference. It was the difference between meaningful improvement and a step change in what one person could produce in a quarter.

GitHub tracked developer activity across pull requests and code commits in 2025 and found a 20-plus percent increase year over year. The developer workforce did not grow by anything close to that number. More was being built by roughly the same number of people, specifically by the people who had moved from occasional use to genuine integration.

Why Most Organisations Are Measuring the Wrong Thing

When Klarna replaced 700 customer service agents with AI in 2024 and publicly declared that the system was doing the work of 700 people, the metric they tracked was volume, conversations handled, resolution speed, cost per interaction. What they did not track, or did not track in time, was quality. Customer satisfaction fell sharply. The escalation paths that human agents had navigated through institutional knowledge and contextual judgment collapsed. Klarna is now rehiring.

The lesson is not that AI cannot handle customer service. It is that volume metrics and efficiency metrics are not the same as output quality. And most enterprise AI adoption programmes are running on volume and efficiency metrics, how many employees are using the tools, how often, how many tasks have been automated, without tracking whether the output is actually better, faster, or more valuable than what came before.

16%

Workers with high AI readiness in 2025

23%

Organisations that offered prompt engineering training in 2025

Forrester's research found that only 16 percent of workers had high AI readiness, what they call AIQ, in 2025. That number is projected to reach just 25 percent in 2026. Only 23 percent of organisations offered any form of prompt engineering training last year. Employees are largely teaching themselves through individual experimentation. Some break through to genuine integration. Most remain occasional users who believe they are working faster than they are.

This is the METR finding at scale. Employees with access to AI tools who have not genuinely changed how they work are often slower than they would be without the tools, because they are interrupting their natural workflow to involve AI in steps where it is not actually helping, while bypassing AI in the steps where it would help most.

What Genuine Fluency Actually Looks Like

The distinction between AI access and AI fluency shows up most clearly when you watch how people decompose problems.

A developer with tool access uses Copilot to autocomplete code. A developer with genuine fluency restructures the problem before touching the keyboard, breaking it into components that AI handles well and components that require human judgment, sequencing the work to let AI carry the repetitive and pattern-matching load while they focus on architecture, edge cases, and the decisions that require real context. The second developer is not using AI more. They are thinking differently because of it.

41%

Of all code written globally is now AI-generated Cursor went from one million to 500 million dollars in annual recurring revenue in twelve months. These are not signs that software engineering is becoming easier. They are signs that the leverage available to a genuinely fluent engineer has increased dramatically.

The teams in CircleCI's top 5 percent, the ones with 97 percent throughput growth, are not using different tools than the bottom quartile. They have developed a fundamentally different relationship with those tools. The AI is not a faster search engine or a smarter autocomplete. It is a collaborator with defined strengths and defined limitations, and they have learned exactly where each applies.

This takes time to develop. It takes deliberate practice. And it requires a kind of organisational permission that most enterprises are not giving, permission to restructure how you work, not just permission to use a new tool while working the old way.

The Widening Gap Between Organisations That Build Fluency and Those That Do Not

Forrester's data contains a finding that almost no organisation is acting on. Gen Z workers have the highest AI readiness at 22 percent, compared to just 6 percent for Baby Boomers. They are the cohort most capable of developing deep AI fluency because they have fewer established workflows to disrupt. Yet enterprises are disproportionately eliminating entry-level positions, cutting the pipeline through which high-AIQ younger workers enter organisations, while retaining the people least likely to develop the depth of integration the data shows is required.

The compounding effect of this is significant. An organisation that systematically develops AI fluency in its people, not access, fluency, sees output gains that compound over time. Each quarter, the most integrated individuals learn to deploy AI across more of their work, take on wider scope, and produce at a rate that was previously achievable only by larger teams. The organisations that do not invest in fluency see the opposite: AI tools adopted at the surface, usage metrics that look positive, and output that has not moved proportionally.

The gap between these two trajectories does not close by itself. It widens. The organisations three or four quarters ahead on genuine fluency are increasingly able to do things that organisations still at the tool-access stage cannot match, not because they have more people or better technology, but because their people have learned to work in a mode that multiplies what each of them can produce.

What the Organisations Getting This Right Are Doing Differently

They are measuring output per person, not tool adoption rate. The conversation inside the organisation is not how many employees are using AI. That metric is easy to game and tells you nothing about whether fluency is developing. The conversation is what is this team producing now compared to six months ago, and where is the gap between their current output and what a genuinely AI-integrated team of the same size should produce. That second number is the competitive gap that needs closing.

They are protecting the people who have already developed genuine fluency from the coordination overhead that slows them down. The METR finding, that experienced developers with AI tools can be slower than without them, almost always correlates with environments where AI is expected to fit into existing processes rather than environments where processes have been restructured around what AI can do. The organisations that let their highest-fluency people restructure how they work are the ones seeing 80 percent output gains rather than 4 percent.

They are investing in fluency as a deliberate organisational skill, not a personal preference. Not a training session that covers what the tools are. A sustained programme that changes how teams decompose problems, how they allocate work between humans and AI, and how they measure the output of that allocation. The organisations doing this are not spending more on headcount. They are spending differently on the people they already have, and getting compounding returns.

The Question Worth Asking This Quarter

If your best analyst, your best engineer, or your best operations lead spent the next quarter working with AI as deeply and intentionally as the power users Meta described, what would they produce that they cannot produce today?

And then: does your organisation currently have the structure to let them?

Most enterprises do not. They have AI tools deployed and usage rates logged and dashboards showing adoption. What they do not have is a clear picture of the distance between where their people are now and where genuine AI fluency would take them. Or a deliberate plan to close it.

The productivity gap is real. The organisations that measure it honestly, output, not adoption, and invest in closing it deliberately will separate from the ones that are still tracking the wrong metric when the results come in.

Frequently Asked Questions

Why is tool adoption not a good enough metric for AI success?

Because tool adoption only tells you that people have access and are using the tools at some level. It does not tell you whether output quality, speed, or value has improved. Many teams can show strong adoption while seeing little measurable performance gain because they have not developed the fluency required to integrate AI into real workflows.

What is the difference between AI access and AI fluency?

AI access means employees can use the tools. AI fluency means they understand how to decompose work so that AI is used in the parts where it creates leverage and avoided in the parts where it introduces drag. Fluency changes how people think, sequence work, and collaborate with AI, not just how often they open the tool.

Why are some teams seeing large productivity gains while others are not?

Because the difference is rarely the tool itself. High-performing teams have genuinely integrated AI into how they work, while others have only made it available. The top teams redesign workflows, allocate tasks differently between humans and AI, and remove process overhead that blocks fluent use. The lower-performing teams often try to insert AI into existing processes without changing how work is actually done.

What should leaders measure instead of AI adoption?

Leaders should measure output per person, quality of output, cycle time, and the gap between current team performance and what a truly AI-integrated version of that same team should be able to produce. Those metrics reveal whether fluency is developing and whether AI is creating real leverage.

How can an organisation actually build AI fluency?

It requires more than access or one-off training. Organisations need deliberate workflow redesign, structured practice, clearer allocation of work between humans and AI, and permission for teams to change how they operate. Fluency becomes visible when people are allowed to work differently, not just when they are told to use a new tool.

Ready to Measure What Actually Matters?

Tailored AI works with enterprise leaders and founding teams to identify where the gap between AI access and AI fluency is costing them output, and how to close it through deliberate team design and workflow redesign.

Start the Conversation

The Productivity Gap Most Companies Misread