When AI Slows You Down: What a New Study Reveals About Developer Productivity


Español
Programadora
Programadora
Christina Morillo

Redacción HC
11/09/2025

The rapid adoption of artificial intelligence (AI) coding tools has been accompanied by bold promises: faster workflows, fewer bugs, and a boost in developer productivity. Platforms like GitHub Copilot and advanced IDE assistants have been marketed as game-changers for software engineering. But what happens when the data doesn’t match the hype?

A recent study by Model Evaluation & Threat Research (METR) challenges prevailing assumptions. Conducted in early 2025, it suggests that experienced open-source developers actually became slower when using cutting-edge AI assistants. Far from being a universal accelerator, AI in this context led to a measurable slowdown — an outcome that raises new questions for companies, policymakers, and technologists.

The Research Question: Beyond Benchmarks and Perceptions

The METR team asked a straightforward but underexplored question: How do AI tools available in early 2025 affect the productivity of experienced developers working on their own open-source repositories?

This inquiry is crucial. While algorithmic benchmarks and self-reported surveys often show positive results, they fail to capture real-world complexities — from debugging legacy code to maintaining high-quality repositories. Misjudging the impact of AI could mislead investment decisions and even shape misguided policies.

Methodology: A Randomized Controlled Trial in Real Codebases

To address the question, METR designed a randomized controlled trial (RCT) — a gold standard in experimental design.

  • Participants: 16 expert developers, each contributing to mature open-source projects with an average of 1M+ lines of code and over 22,000 GitHub stars.
  • Tasks: 246 real issues were selected, ranging from bug fixes to feature development and refactoring.
  • Conditions: Issues were randomly assigned to either a with-AI condition (developers could freely use tools like Cursor Pro powered by Claude 3.5/3.7 Sonnet) or a without-AI condition (traditional workflow only).
  • Measurement: Each task averaged ~2 hours. Developers recorded their screens, reported completion times, and submitted pull requests for review.

Compensation was set at $150/hour, underscoring the professional nature of the trial. The primary outcome: task completion time, supplemented by quality checks such as testing, documentation, and style compliance.

Key Findings: A Surprising Slowdown

The results were both striking and counterintuitive:

  • Developers using AI took 19% longer to complete tasks than those without AI assistance.
  • Before the trial, participants estimated AI would make them 24% faster.
  • Even after experiencing the slowdown, many still believed they had saved ~20% time.

This disconnect between perception and reality points to what the authors describe as an illusion of productivity.

Why did AI slow them down?

  1. Prompting overhead: Significant time was spent iterating on prompts and cleaning up AI-generated output.
  2. Low adoption rate: Less than half of AI suggestions were directly usable, requiring additional revision.
  3. Expert efficiency: Developers already familiar with their codebases were highly efficient, leaving little room for AI to add value.
  4. Quality standards gap: The AI often failed to automatically meet strict requirements for testing, documentation, and formatting.

The study’s robustness checks ruled out simpler explanations like task imbalance or inconsistent quality between conditions.

Context: Why These Results Differ From Other Studies

Earlier trials have shown AI accelerating developer tasks. For example:

  • A 2023 RCT with GitHub Copilot reported participants completing coding tasks 55% faster.
  • Google engineers using AI saw a 21% reduction in task completion time under certain conditions.

The divergence lies in context. Unlike controlled coding exercises or corporate tasks, METR’s study focused on experienced contributors working in mature, complex repositories. In such environments, the marginal value of AI appears lower, while the overhead of integration is higher.

Implications: For Companies, Developers, and Policymakers

The findings carry important lessons:

  • For teams: AI isn’t plug-and-play. Productivity gains depend on project maturity, codebase complexity, and developer expertise.
  • For measurement: Real-world metrics matter more than user perception. Surveys may not capture true efficiency.
  • For adoption: Custom integration — such as fine-tuning models on specific repositories or automating testing pipelines — may reduce friction.
  • For training: Developers may need hundreds of hours of practice before unlocking AI’s potential.

At the policy level, the authors caution against overestimating AI’s productivity boost, which could distort forecasts in innovation, labor markets, and risk regulation.

Conclusion: From Hype to Evidence

The METR study underscores a vital point: not all AI deployments accelerate work. In fact, in the hands of skilled open-source developers working on complex repositories, AI slowed them down.

For organizations considering widespread adoption, the message is clear: test rigorously, measure carefully, and adapt thoughtfully. For policymakers, the results highlight the importance of grounding AI narratives in data, not hype.

The road to AI-enhanced productivity is far from linear. Sometimes, the fastest way forward is a slower, more measured step back.

Reference:
Becker J, Rush N, Barnes E, Rein D. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity [Internet]. arXiv; 2025 Jul [cited 2025 Sep 6]. Available from: https://arxiv.org/abs/2507.09089

Becker J, Rush N, Barnes E, Rein D. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity [Internet]. San Francisco: Model Evaluation & Threat Research (METR); 2025 Jul [cited 2025 Sep 6]. Available from: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Jain A, et al. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot [Internet]. arXiv; 2023 Feb [cited 2025 Sep 6]. Available from: https://arxiv.org/abs/2302.06590

Sadowski C, et al. The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot [Internet]. arXiv; 2024 Oct [cited 2025 Sep 6]. Available from: https://arxiv.org/abs/2410.02091

License

Creative Commons license 4.0. Read our license terms and conditions
Beneficios de publicar

Latest Updates

Figure.
When Animals Disappear, Forests Lose Their Power to Capture Carbon
Figure.
Sixteen Weeks That Moved Needles: How Nutrition Education Improved Diet and Child Hemoglobin in a Peruvian Amazon Community
Figure.
When Plastics Meet Pesticides: How Nanoplastics Boost Contaminant Uptake in Lettuce