Tuesday, 25 February 2025

No, AI Is Not Going to Replace Programmers As AI Is ‘Unable To Solve the Majority’ of Problems

Fears that AI is going to replace programmers wholesale appear to be unfounded, as a new study by OpenAI finds even the most advanced models are “unable to solve the majority of tasks.”

AI is revolutionizing countless industries, while simultaneously causing many to fear for their jobs. Programming, in particular, is one field that is being heavily influenced by AI, with none other than Google’s Sergey Brin saying he relies on AI to write code.

“I think that AI touches so many different elements of day-to-day life, and sure, search is one of them,” Brin said in an interview in late 2024. “But it kind of covers everything. For example, programming itself, the way that I think about it is very different now.

“Writing code from scratch feels really hard, compared to just asking the AI to do it,” Brin added. “I’ve written a little bit of code myself, just for kicks, just for fun. And then sometimes I’ve had the AI write the code for me, which was fun.”

OpenAI’s Cautionary New Study

Despite many companies, executives, and programmers looking to AI to handle much of the coding that is currently done by humans, OpenAI’s study should give everyone reason to pause. The company evaluated “frontier models,” the term for the industry’s leading-edge, most advanced models. The company also used a benchmark that pulled from “over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts.” The tasks ranged from small bug fixes to feature implement ions worth $32,000.

We evaluate model performance and find that frontier models are still unable to solve the majority of tasks. To facilitate future research, we open-source a unified Docker image and a public evaluation split, SWELancer Diamond (https://github.com/ openai/SWELancer-Benchmark). By mapping model performance to monetary value, we hope SWE-Lancer enables greater research into the economic impact of AI model development.

OpenAI Warns Against the Risks of Improvement

While many people’s reaction may be to try to improve AI so that it can program more effectively, OpenAI points out that such a path is not without risks. Those risks include the very kind of impacts to the job market that many fear. Even more than than, AI models that excel at programming could pose an autonomy risk, such as self-improvement, data exfiltration data, and more.

AI models with strong real-world software engineering capabilities could enhance productivity, expand access to highquality engineering capabilities, and reduce barriers to technological progress. However, they could also shift labor demand—especially in the short term for entry-level and freelance software engineers—and have broader long-term implications for the software industry. Improving AI software engineering is not without risk. Advanced systems could carry model autonomy risk in self-improvement and potential exfiltration, while automatically generated code may contain security flaws, disrupt existing features, or stray from industry best practices, a consideration that is important if the world increasingly relies on model-generated code. SWE-Lancer provides a concrete framework to start tying model capabilities to potential real-world software automation potential, therefore helping better measure its economic and social implications. By quantifying AI progress in software engineering, we aim to help inform the world about the potential economic impacts of AI model development, while underscoring the need for careful and responsible deployment. To further support responsible AI progress, we open-source a public eval set and report results on frontier AI models to help level-set the implications of AI progress. Future work should explore the societal and economic implications of AI-driven development, ensuring these systems are integrated safely and effectively.

Conclusion

OpenAI’s study underscores the danger of jumping too quickly to replace human employees with AI. While AI is a valuable tool, it does its best work when used in conjunction with human employees, not in place of them.

When AI is used to augment the work of human programmers, those workers can oversee AI’s work, ensuring it is accurate. Just as important, human programmers can also serve in the all-important role of architect, designing the applications and systems, and then using AI to code specific aspects of those projects.

Ultimately, OpenAI’s research provides a dose of reality about just what AI can do and what it cannot—or should not—do.



from WebProNews https://ift.tt/7GyseVX

No comments:

Post a Comment