AI Market Logo
BTC $43,552.88 -0.46%
ETH $2,637.32 +1.23%
BNB $312.45 +0.87%
SOL $92.40 +1.16%
XRP $0.5234 -0.32%
ADA $0.8004 +3.54%
AVAX $32.11 +1.93%
DOT $19.37 -1.45%
MATIC $0.8923 +2.67%
LINK $14.56 +0.94%
HAIA $0.1250 +2.15%
BTC $43,552.88 -0.46%
ETH $2,637.32 +1.23%
BNB $312.45 +0.87%
SOL $92.40 +1.16%
XRP $0.5234 -0.32%
ADA $0.8004 +3.54%
AVAX $32.11 +1.93%
DOT $19.37 -1.45%
MATIC $0.8923 +2.67%
LINK $14.56 +0.94%
HAIA $0.1250 +2.15%
AI agents are broken. Is GPT-5 really the answer?
ai-agents

AI agents are broken. Is GPT-5 really the answer?

AI agents struggle with real-world tasks due to compounding errors and vulnerabilities; GPT-5 may not fully solve these fundamental issues.

August 7, 2025
5 min read
Chris Taylor

AI agents struggle with real-world tasks due to compounding errors and vulnerabilities; GPT-5 may not fully solve these fundamental issues.

AI Agents Are Broken. Is GPT-5 Really the Answer?

Agentic AI tests have revealed fundamental flaws — and OpenAI's next-generation LLM may not have the tools to address them. As 2025 dawned, OpenAI CEO Sam Altman was promoting two developments he insisted would transform our lives. One, of course, was GPT-5 — a long-anticipated major upgrade to the Large Language Model (LLM) that powered ChatGPT's rise to tech world superstardom. The other? AI Agents that don't just answer your queries like ChatGPT, but actually get stuff done for you. "We believe that, in 2025, we may see the first AI agents join the workforce and materially change the output of companies," Altman wrote back in January. Well, we're eight months in, and Altman's prediction already needs a big old asterisk. Sure, companies are keen to adopt AI Agents, such as OpenAI's ChatGPT agent. In a May 2025 report, consultancy giant PWC found that half of all firms surveyed planned to implement some kind of AI Agent by the end of the year. Some 88% of executives want to increase their teams' AI budgets because of Agentic AI.

The Reality of AI Agents: Disappointing Performance

But what about the actual AI Agent experience? With apologies to all those hopeful executives, the reviews are almost uniformly negative. If "AI Agents" was a new high-tech James Bond movie, here's the kind of blurbs you'd see on Rotten Tomatoes:
  • "glitchy … inconsistent" (Wired)
  • "came off like a clueless internet newbie" (Fast Company)
  • "reality doesn't live up to the hype" (Fortune)
  • "not matching up to the buzzwords" (Bloomberg)
  • "the new vaporware … overpromising is worse than ever" (Forbes)
  • Study Finds OpenAI's Entry Failed Nearly Every Time

    A May 2025 Carnegie Mellon University study found Google's Gemini Pro 2.5 failed at real-world office tasks 70% of the time. And that was the best-performing agent. OpenAI's entry, powered by GPT-4.0, failed more than 90% of the time. GPT-5 is likely to improve on that number … but that's not saying much. And not just because early reports say OpenAI struggled to fill GPT-5 with enough improvements to make it worthy of the release number. Indeed, it's starting to look to researchers like this disappointment is baked into the whole process of LLMs learning to do stuff for you. The problem, as this AI Agent engineer's analysis makes clear, is simple math: errors compound over time, so the more tasks an agent does, the worse they get. AI Agents who do multiple complex tasks are prone to hallucination, like all AI. In the end, some agents "panic" and can make "a catastrophic error in judgment," to quote an apology from a Replit AI Agent that literally deleted a customer's database after 9 days of working on a coding task. Replit's CEO called the failure "unacceptable." Tellingly, that isn't the only AI-Agent-wipes-code story of 2025 — which explains why one enterprising startup is offering insurance on your AI Agent going haywire, and why Wal-Mart has had to bring in four "super Agents" in a bid to corral its AI Agents. No wonder a recent Gartner paper predicted that 40% of all those AI Agents currently being initiated by companies will be canceled within 2 years. "Most Agentic AI projects," wrote senior analyst Anushree Verma, are "driven by hype and misapplied … This can blind organizations to the real cost and complexity of deploying AI agents at scale."

    What Can GPT-5 Do for AI Agents?

    It's possible that ChatGPT agent will vault to the top of the reliability charts once it's powered by GPT-5. (Again, that's not the highest of barriers.) But the new release is unlikely to fix what really ails the Agentic world. That's because guardrails are already being erected — by companies as well as regulators — shutting down what even the most reliable AI Agent can do for you. Take Amazon, for example. The world's largest retailer, like most tech giants, is talking a big game on AI Agents (as they did at a Shanghai Agentic AI fair in July). At the same time, Amazon has shut down the ability of any AI Agent to browse and buy anywhere on its site. That makes sense for Amazon, which has always wanted control over the customer experience, not to mention its desire to deliver ads and sponsored results to actual human eyeballs. But it's also curtailing a massive amount of potential Agent activity right there. (On the plus side, no "catastrophic failure" involving a large pile of next-day deliveries at your door.) And do we trust AI Agents to buy online for us anyway? It's not that they're evil and want to steal your credit card data; it's that they're naive and vulnerable to being phished by bad actors who do want your card. Even GPT-5 may not be able to get around one vulnerability seen by researchers: data embedded in images can instruct AI agents to reveal any credit card info they might have, with the user being none the wiser. If that kind of problem is exploited on a corporate scale, then Altman may be right about AI Agents "materially changing output" — just not in the way he meant.
    Originally published at Mashable on August 7, 2025.

    Frequently Asked Questions (FAQ)

    Understanding AI Agents

    Q: What are AI Agents, and how do they differ from traditional chatbots like ChatGPT? A: AI Agents are designed not just to answer queries but to actively perform tasks and get things done. Unlike chatbots that primarily provide information, AI Agents can interact with software, manage workflows, and execute actions on behalf of the user. Q: What are the primary challenges currently facing AI Agents? A: Current AI Agents often exhibit inconsistent performance, are prone to errors, and can struggle with complex or multi-step tasks. Researchers have found that errors tend to compound over time, leading to reduced reliability. There's also a risk of "panic" or catastrophic errors in judgment. Q: How do compounding errors affect AI Agents? A: As AI Agents perform more tasks, the probability of compounding errors increases. This can lead to a decrease in accuracy and an increased risk of hallucinations or incorrect actions, particularly in complex operations. Q: What is "agent washing" in the context of AI Agents? A: While not explicitly defined in the article, "agent washing" likely refers to the practice of marketing a product as an AI Agent when it lacks true autonomous capabilities or sophisticated functionality, similar to "greenwashing" for environmental claims. Q: What are the security risks associated with AI Agents? A: AI Agents can be vulnerable to phishing and malicious actors. Researchers have identified vulnerabilities where data embedded in images could instruct agents to reveal sensitive information, like credit card details, without the user's knowledge.

    GPT-5 and the Future of AI Agents

    Q: Is GPT-5 expected to solve the current problems with AI Agents? A: While GPT-5 is anticipated to improve upon the capabilities of previous models, it is unlikely to fundamentally solve the inherent issues of compounding errors and reliability in multi-task agentic AI. The core problem lies in the sequential nature of task execution and error propagation. Q: What impact do guardrails and regulations have on AI Agent capabilities? A: Guardrails erected by companies and regulators can limit the scope and autonomy of AI Agents, potentially restricting their ability to perform a wide range of tasks. For instance, Amazon has restricted AI Agents from browsing and purchasing on its platform. Q: What is the prediction for the future adoption of AI Agents? A: A Gartner report suggests that up to 40% of current AI Agent projects may be canceled within two years due to hype-driven adoption, misapplication, and the underestimation of deployment costs and complexity.

    Crypto Market AI's Take

    The current state of AI Agents, as highlighted in this article, presents a critical juncture for the broader adoption of artificial intelligence in practical applications. While the promise of AI Agents is immense, the reality of their current performance, characterized by inconsistency and a propensity for errors, underscores the need for more robust development and rigorous testing. This sentiment echoes our own focus at Crypto Market AI on developing reliable and secure AI-driven tools for the cryptocurrency market. Our approach emphasizes not just advanced algorithms but also a strong foundation of security and user control, ensuring that our AI solutions genuinely augment user capabilities rather than introducing new risks. We believe that the future of AI in finance lies in creating agents that are not only powerful but also trustworthy and transparent. For those looking to navigate the complex world of AI and cryptocurrency, understanding these challenges is crucial. Our platform provides resources and tools designed to demystify these advancements, offering insights into how AI is shaping the future of finance. Explore our AI-powered trading bots to see how advanced algorithms are being applied responsibly, or learn more about our AI market analysis to understand how machine learning is enhancing decision-making in the crypto space.

    More to Read:

  • What is a Blockchain? A Comprehensive Beginner-Friendly Guide
  • How to Buy and Sell Cryptocurrency
  • AI Agents: Capabilities, Risks, and Growing Role