If chatbots answer your queries, AI agents are supposed to do things on your behalf, actively managing your life and business. They might shop for you, book travel, organize your calendar, summarize news, track finances, maintain databases, or even manage software systems. Ultimately, agents should be able to perform any cognitive task you might ask a human to do. By the end of 2024, major players like Google, OpenAI, and Anthropic announced their first AI agents were imminent. Anthropic’s Dario Amodei predicted that 2025 would be the year AI could perform at the level of a PhD student or early professional. OpenAI’s CEO Sam Altman expressed optimism that AI agents would “join the workforce” and materially impact company output. Google introduced Project Astra, aiming to build a universal AI assistant. Tech columnist Kevin Roose of the New York Times was cautiously optimistic, but the question remains: do these agents actually work? My own prediction at the start of 2025 was that AI agents would be hyped but remain unreliable except in narrow use cases. With five months left in the year, this assessment largely holds true. All major companies have introduced agents, but none are reliably effective beyond limited scenarios. Google’s Astra remains in beta with restricted access. OpenAI released its ChatGPT agent, which operates by proactively selecting from a toolbox of skills to complete tasks using its own virtual computer. It can interact with calendars, browse text and visual websites, download and manipulate files, and adapt its approach for speed and accuracy. However, ChatGPT agent is still early-stage and prone to mistakes, which introduces risks, especially when working directly with user data. These errors have significantly limited the practical utility of current AI agents, falling short of the promised breakthroughs. By March 2025, it was clear that hype and reality were diverging. For example, the AI agent Manus, once hyped, failed to meet expectations. Reports have since accumulated highlighting failures and technical debt, especially in AI coding agents producing hard-to-debug, repetitive code. As MIT Professor Armando Solar-Lezama noted, AI is like a new credit card accumulating technical debt faster than before. Penrose.com created a benchmark for basic accounting tasks using real data and found AI errors tend to compound over time, undermining reliability. Hallucinations—AI confidently generating false or misleading information—remain a persistent and serious problem. Industry insiders acknowledge a “demo to reality” gap that will likely persist for years. Recent analyses show AI agents failing 70% of tasks in some benchmarks, a failure rate that threatens the industry’s credibility. Even enthusiastic users struggle to find practical use cases for ChatGPT agent’s limited capabilities. Security is another major concern. AI agents’ superficial understanding makes them vulnerable to cyberattacks. Research from Carnegie Mellon University revealed that even the most secure systems were successfully attacked over 1% of the time, a dangerously high rate for critical applications. These flaws are unsurprising because current AI systems rely on mimicry rather than deep understanding. They can imitate human language and task completion but lack true comprehension of their actions, leading to hallucinations and errors. Multi-step tasks multiply the chances for mistakes, sometimes resulting in catastrophic failures. I do not expect AI agents to disappear; eventually, they could become invaluable time-savers. However, I doubt that large language models (LLMs) alone will provide the reliable foundation needed. Recent reports from The Information and new academic papers confirm that pure scaling of LLMs yields diminishing returns and is unlikely to achieve artificial general intelligence (AGI) breakthroughs. Without integrating neurosymbolic AI and rich world models—approaches I have advocated for years—reliable AI agents remain out of reach. Robust, trustworthy AI requires more than just bigger models; it needs fundamentally different architectures. The current focus on LLMs as a shortcut has been a massive intellectual and economic mistake. Despite nearly a trillion dollars invested, these systems cannot reliably manage calendars, finances, or perform at the promised professional levels. Yet, investment continues to pour into generative AI, while alternative approaches like neurosymbolic AI remain severely underfunded, receiving perhaps less than 1% of total AI investment. Perhaps after enough failures, reality will set in, and the industry will reconsider its approach. Over the years, I have anticipated many trends in AI. For those interested in thoughtful analysis, I invite you to subscribe and support this work.

Gary Marcus founded a machine learning company acquired by Uber and is the author of six books on natural and artificial intelligence.

Source: Originally published at garymarcus.substack.com on August 3, 2025.

Frequently Asked Questions (FAQ)

AI Agent Reliability and Functionality

Q: What are AI agents, and how do they differ from chatbots? A: AI agents are designed to perform tasks and take actions on your behalf, actively managing aspects of your life or business. This goes beyond chatbots, which primarily answer queries. AI agents can handle tasks like shopping, booking travel, managing calendars, summarizing information, tracking finances, and even managing software systems. Q: Have AI agents been successful in performing tasks beyond narrow use cases? A: Currently, most AI agents are still in their early stages and are not reliably effective beyond limited scenarios. Despite significant investment and announcements from major tech companies, practical utility has been significantly limited by errors and a gap between demo capabilities and real-world performance. Q: What are the main challenges hindering the reliability of current AI agents? A: Key challenges include:

Errors and Hallucinations: AI agents are prone to mistakes and confidently generating false or misleading information.
"Demo to Reality" Gap: There's a persistent gap between advertised capabilities and actual performance.
Technical Debt: Issues like AI coding agents producing hard-to-debug code contribute to unreliability.
Lack of Deep Understanding: Current systems often mimic rather than truly comprehend, leading to errors in multi-step tasks.
Security Vulnerabilities: Superficial understanding makes them susceptible to cyberattacks.

Q: What is the "demo to reality" gap in AI agents?

Q: Why are AI agents prone to errors and hallucinations?

Q: What is the typical failure rate for AI agents in benchmarks?

Q: What are the security concerns associated with AI agents?

Future of AI Agents and Required Advancements

Q: Will AI agents disappear due to current limitations?

Q: What is needed for AI agents to become truly reliable?

Q: What is the role of neurosymbolic AI in achieving reliable AI agents?

Q: What are the implications of the current focus on LLMs for AI agent development?

Q: How does investment in AI compare between LLMs and alternative approaches?

Crypto Market AI's Take

trading bots

AI Agents have, so far, mostly been a dud

Frequently Asked Questions (FAQ)

AI Agent Reliability and Functionality

Future of AI Agents and Required Advancements

Crypto Market AI's Take

More to Read: