Inside OpenAI’s Ambitious Quest to Build AI That Can Do Anything for You

Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he witnessed the launch of ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions. Today, that team, known as MathGen, is instrumental to OpenAI’s industry-leading effort to create AI reasoning models: the core technology behind AI agents that can perform tasks on a computer like a human would. “We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at,” Lightman told TechCrunch, describing MathGen’s early work. OpenAI’s models are far from perfect — the company’s latest AI systems still hallucinate and its agents struggle with complex tasks. But its state-of-the-art models have significantly improved mathematical reasoning. One of OpenAI’s models recently won a gold medal at the International Math Olympiad, a competition for the world’s brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects and ultimately power the general-purpose agents it has long dreamed of building. ChatGPT was a happy accident — a lowkey research preview turned viral consumer business — but OpenAI’s agents are the product of a years-long, deliberate effort. “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” said OpenAI CEO Sam Altman at the company’s first developer conference in 2023. “These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.” Whether agents will meet Altman’s vision remains to be seen, but OpenAI shocked the world with the release of its first AI reasoning model, o1, in fall 2024. Less than a year later, the 21 foundational researchers behind that breakthrough are the most sought-after talent in Silicon Valley. Mark Zuckerberg recruited five of the o1 researchers to work on Meta’s new superintelligence-focused unit, offering compensation packages north of $100 million. One of them, Shengjia Zhao, was recently named chief scientist of Meta Superintelligence Labs.

The reinforcement learning renaissance

The rise of OpenAI’s reasoning models and agents is tied to reinforcement learning (RL), a machine learning technique that provides feedback to an AI model on whether its choices were correct in simulated environments. RL has been used for decades. In 2016, about a year after OpenAI was founded, Google DeepMind’s RL-based AI, AlphaGo, gained global attention by beating a world champion in the board game Go. Around that time, one of OpenAI’s first employees, Andrej Karpathy, began exploring how to leverage RL to create an AI agent that could use a computer. But it took years for OpenAI to develop the necessary models and training techniques. By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive internet data and large GPU clusters. GPT models excelled at text processing but struggled with basic math. It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed “Q*” and then “Strawberry,” by combining large language models (LLMs), RL, and a technique called test-time computation. This technique gave models extra time and computing power to plan and verify steps before answering. This enabled a new approach called “chain-of-thought” (CoT), which improved AI’s performance on math questions the models hadn’t seen before. “I could see the model starting to reason,” said researcher El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.” Though individually these techniques weren’t novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly realized that the planning and fact-checking abilities of AI reasoning models could power AI agents. “We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”

Scaling reasoning

With AI reasoning models, OpenAI identified two new axes to improve AI: using more computational power during post-training and giving models more time and processing power while answering questions. “OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” said Lightman. Shortly after the 2023 Strawberry breakthrough, OpenAI formed an “Agents” team led by researcher Daniel Selsam to advance this new paradigm. Initially, the company did not differentiate between reasoning models and agents; the goal was simply to make AI systems capable of completing complex tasks. The Agents team’s work became part of the larger o1 reasoning model project, led by co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki. OpenAI had to divert precious resources — talent and GPUs — to create o1. Researchers often had to negotiate with leadership for resources, and breakthroughs were key to securing them. “One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’” Some former employees say OpenAI’s mission to develop AGI was crucial in achieving breakthroughs around AI reasoning models. By focusing on the smartest AI models rather than products, OpenAI prioritized o1 above other efforts — a large investment not always possible at competing labs. This decision proved prescient. By late 2024, many AI labs saw diminishing returns on models created through traditional pretraining scaling. Today, much of AI’s momentum comes from advances in reasoning models.

What does it mean for an AI to “reason?”

AI research aims to recreate human intelligence with computers. Since o1’s launch, ChatGPT’s user experience has included more human-like features such as “thinking” and “reasoning.” When asked if OpenAI’s models truly reason, El Kishky said he thinks about it in computer science terms. “We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,” he said. Lightman focuses on results rather than the means or similarity to human brains. “If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,” he said. “We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.” OpenAI researchers acknowledge that definitions of reasoning may vary and critics exist, but argue capabilities matter more. Other AI researchers tend to agree. Nathan Lambert, an AI researcher at AI2, compares AI reasoning models to airplanes — manmade systems inspired by nature but operating differently. That doesn’t make them less useful or capable. A group of researchers from OpenAI, Anthropic, and Google DeepMind recently agreed in a position paper that AI reasoning models are not well understood and more research is needed.

The next frontier: AI agents for subjective tasks

Current AI agents work best for well-defined, verifiable domains such as coding. OpenAI’s Codex agent helps software engineers offload simple coding tasks. Anthropic’s models have become popular in AI coding tools like Cursor and Claude Code — some of the first AI agents people pay for. However, general-purpose AI agents like OpenAI’s ChatGPT Agent and Perplexity’s Comet struggle with complex, subjective tasks people want to automate. For example, online shopping or finding long-term parking can be slow and error-prone. Agents are early systems that will improve, but researchers must figure out how to better train models for subjective tasks. “Like many problems in machine learning, it’s a data problem,” said Lightman. “Some of the research I’m really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.” Noam Brown, an OpenAI researcher who helped create the IMO model and o1, said OpenAI has new general-purpose RL techniques allowing them to teach AI skills that aren’t easily verified. This enabled the model that won the IMO gold medal. OpenAI’s IMO model spawns multiple agents that explore ideas simultaneously and choose the best answer. Google and xAI have recently released state-of-the-art models using similar techniques. “I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” said Brown. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.” These techniques may improve OpenAI’s models further, gains that could appear in the upcoming GPT-5 model. OpenAI hopes to assert dominance with GPT-5, ideally offering the best AI model to power agents for developers and consumers. But the company also wants to make products simpler to use. El Kishky says OpenAI aims to build AI agents that intuitively understand user intent without requiring specific settings, knowing when to call tools and how long to reason. This vision paints an ultimate ChatGPT: an agent that can do anything on the internet for you, understanding how you want it done. This is far from today’s ChatGPT, but OpenAI’s research is headed there. While OpenAI led the AI industry years ago, it now faces worthy opponents. The question is not just whether OpenAI can deliver its agentic future, but whether it can do so before Google, Anthropic, xAI, or Meta.

Frequently Asked Questions (FAQ)

Understanding OpenAI's AI Agents

Q: What are OpenAI's AI reasoning models? A: OpenAI's AI reasoning models are systems designed to enhance AI's ability to perform complex tasks and solve problems, moving beyond simple pattern recognition to a more human-like problem-solving approach. Q: How does reinforcement learning (RL) relate to AI agents? A: Reinforcement learning is a key machine learning technique used to train AI models by providing feedback on their actions in simulated environments, which is crucial for developing capable AI agents that can learn and adapt. Q: What is the significance of the "Strawberry" breakthrough? A: The "Strawberry" breakthrough, achieved by combining LLMs, RL, and test-time computation, significantly improved AI's mathematical reasoning and planning capabilities, paving the way for advanced AI agents. Q: What are AI agents capable of doing? A: AI agents are designed to perform tasks on a computer much like a human would. This can range from coding assistance to eventually handling a wide array of user requests and complex operations. Q: What are the challenges OpenAI faces in building general-purpose AI agents? A: OpenAI's agents currently struggle with complex, subjective tasks, and improving their ability to handle these tasks requires advancements in data and training methodologies.

OpenAI's Research and Development

Q: How did OpenAI's MathGen team contribute to AI reasoning? A: The MathGen team at OpenAI focused on improving the mathematical reasoning capabilities of AI models, a foundational step for developing more sophisticated AI agents. Q: What was the role of o1 in OpenAI's AI development? A: o1 was OpenAI's first AI reasoning model, representing a significant breakthrough in the company's efforts to build more capable and general-purpose AI systems. Q: Why are researchers leaving OpenAI for Meta's AI unit? A: The article mentions that top researchers from OpenAI's o1 project were recruited by Meta with substantial compensation packages, indicating a high demand for expertise in AI reasoning and superintelligence. Q: What is the difference between AI models and AI agents? A: AI reasoning models are the core technology that enables AI agents. AI agents are the systems that utilize these models to perform tasks and interact with the environment. Q: How does OpenAI approach research and development? A: OpenAI emphasizes a "bottom-up" approach where breakthroughs and evidence of progress drive resource allocation, allowing promising research areas like AI reasoning to gain momentum.

Crypto Market AI's Take

OpenAI's ambitious pursuit of AI agents capable of performing any task highlights a significant convergence between advanced AI research and the practical application of autonomous systems. This mirrors the trajectory of our own platform at Crypto Market AI, where we leverage AI for sophisticated market analysis, trading strategies, and to empower users with intelligent financial tools. The advancements in AI reasoning, as seen with OpenAI's o1 model, are directly relevant to how we develop more sophisticated AI-powered trading bots and analytical tools. As OpenAI aims to create agents that can handle complex, subjective tasks, our focus on developing AI that can navigate the nuances of the cryptocurrency market, from real-time data analysis to predictive modeling, is paramount. This evolution in AI capabilities promises to unlock new levels of automation and efficiency, not just in general computing, but specifically within the dynamic and rapidly evolving world of digital assets. For those interested in the intersection of AI and finance, exploring our range of AI-powered trading bots can offer valuable insights into how these technologies are currently being applied in the crypto space.

Inside OpenAI’s quest to make AI do anything for you