AT2k Design BBS Message Area
Casually read the BBS message area using an easy to use interface. Messages are categorized exactly like they are on the BBS. You may post new messages or reply to existing messages!

You are not logged in. Login here for full access privileges.

Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page
   Local Database  Slashdot   [113 / 240] RSS
 From   To   Subject   Date/Time 
Message   VRSS    All   AI Models Still Struggle To Debug Software, Microsoft Study Show   April 11, 2025
 12:40 AM  

Feed: Slashdot
Feed Link: https://slashdot.org/
---

Title: AI Models Still Struggle To Debug Software, Microsoft Study Shows

Link: https://developers.slashdot.org/story/25/04/1...

Some of the best AI models today still struggle to resolve software bugs that
wouldn't trip up experienced devs. TechCrunch: A new study from Microsoft
Research, Microsoft's R&D division, reveals that models, including
Anthropic's Claude 3.7 Sonnet and OpenAI's o3-mini, fail to debug many issues
in a software development benchmark called SWE-bench Lite. The results are a
sobering reminder that, despite bold pronouncements from companies like
OpenAI, AI is still no match for human experts in domains such as coding. The
study's co-authors tested nine different models as the backbone for a "single
prompt-based agent" that had access to a number of debugging tools, including
a Python debugger. They tasked this agent with solving a curated set of 300
software debugging tasks from SWE-bench Lite. According to the co-authors,
even when equipped with stronger and more recent models, their agent rarely
completed more than half of the debugging tasks successfully. Claude 3.7
Sonnet had the highest average success rate (48.4%), followed by OpenAI's o1
(30.2%), and o3-mini (22.1%).

Read more of this story at Slashdot.

---
VRSS v2.1.180528
  Show ANSI Codes | Hide BBCodes | Show Color Codes | Hide Encoding | Hide HTML Tags | Show Routing
Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page

VADV-PHP
Execution Time: 0.0189 seconds

If you experience any problems with this website or need help, contact the webmaster.
VADV-PHP Copyright © 2002-2025 Steve Winn, Aspect Technologies. All Rights Reserved.
Virtual Advanced Copyright © 1995-1997 Roland De Graaf.
v2.1.250224