August 6, 2025
5 min read
Sharon Goldman
Top AI researchers at UC Berkeley agree AI agents hold promise but face major challenges before real-world reliability.
From OpenAI to Nvidia: Why AI Agents Still Have a Long Road Ahead
Only in the Bay Area does spending a Saturday geeking out about AI agentsâalongside 2,000 students, researchers, and tech insiders crammed into UC Berkeleyâfeel like a totally normal weekend plan. At the day-long Agentic AI Summit, the atmosphere was less like an academic conference and more like Silicon Valleyâs buzzy brunch spot. The speaker lineup was impressive, featuring top AI researchers and scientists such as Jakob Pachocki, chief scientist at OpenAI; Ed Chi, VP of research at Google DeepMind; Bill Dally, chief scientist at Nvidia; Ion Stoica, cofounder at Databricks & Anyscale and UC Berkeley professor; and Dawn Song, a pioneering UC Berkeley professor focused on AI security. The buzz was also driven by the topicâAI agents, defined as AI-powered systems that can autonomously complete tasks using other software tools. Imagine a chatbot that not only suggests a vacation itinerary but also books flights and hotel reservations. As my colleague Jeremy Kahn noted recently, this kind of automation has long been a C-suite dream. Over the past decade, companies embraced robotic process automation (RPA) to automate repetitive tasks like cutting and pasting between databases. However, traditional RPA systems are inflexible and handle only narrow tasks. Agentic AI aims to be more flexible and powerful, adapting to diverse business needs. In a January 2025 blog post, OpenAI CEO Sam Altman said, âWe believe that, in 2025, we may see the first AI agents âjoin the workforceâ and materially change the output of companies.â Despite the hype, the overall message at the Agentic AI Summit was cautious and grounded: AI agents are the buzziest trend today, but the technology still has a long way to go. Agents arenât always reliable and often fail to remember prior context. Google DeepMindâs Ed Chi emphasized the gap between what agents can do in curated demos versus real-world production environments. Jakob Pachicki highlighted concerns about safety, security, and trustworthiness, especially when agents operate autonomously or in sensitive applications. âI still donât think agents have really lived up to their promise,â said Sherwin Wu, head of engineering at OpenAI API. âCertain more generic cases have worked, but my day-to-day work doesnât really feel that different with agents.â While todayâs AI agents may not yet fulfill the massive hypeâconsider Salesforce CEO Marc Benioffâs recent claim that a shift to digital labor means he will be the âlast CEO of Salesforce who only managed humansââthe summit speakers remained optimistic. Ion Stoica expressed enthusiasm about infrastructure improvements making it easier to build agentic systems. Bill Dally from Nvidia suggested that hardware advances will enable more powerful and efficient agent behavior. Several speakers pointed out ânarrow winsâ in specific domains like coding. Though AI agents still face growing pains, the packed UC Berkeley ballroom showed the industryâs focus remains on the prize: AI agents that can reliably operate in the real world. The payoff, they believe, will be worth the wait.AI in the News
- U.S. agency approves OpenAI, Google, Anthropic for federal AI vendor list. The General Services Administration added OpenAIâs ChatGPT, Googleâs Gemini, and Anthropicâs Claude to an approved AI vendor list to accelerate government use. Providers commit to responsible use and federal standards.
- The AI spending boomâs economic impact. Big Techâs $350 billion AI investment in 2025 is fueling data center construction and demand for chips and servers, potentially boosting U.S. GDP growth by up to 0.7%. Economists warn of risks if the AI boom slows.
- AI sales tool Clay raises $100 million at a $3.1 billion valuation. Clay helps sales reps find leads and convert them to customers. The funding round was led by CapitalG, Alphabetâs investment arm.
- Google DeepMindâs Genie 3 creates real-time interactive simulations. Genie 3 can generate rich, interactive virtual worlds from text prompts, allowing navigation of consistent AI-generated environments responding to commands. Access is currently limited to select researchers.
- Could "depth of thought" be key to AI reasoning? Singaporeâs Sapient Intelligence released the Hierarchical Reasoning Model (HRM), a tiny AI inspired by the brainâs layered thinking. Despite being 100x smaller than ChatGPT and trained on just 1,000 examples, HRM solves complex logic tasks by reasoning internally, hinting at a shift where depth of thought may matter more than scale.
- AI Agents Capabilities and Risks
- The Future of AI in Trading
- Understanding Cryptocurrency Markets
Eye on AI Research
Brain Food
Source: From OpenAI to Nvidia, researchers agree: AI agents have a long way to go