AI Agent EnIGMA Autonomously Solves Complex Cybersecurity Challenges Using Text-Based Tools

Artificial intelligence agents—AI systems that can work independently toward specific goals without constant human guidance—have shown strong capabilities in software development and web navigation. However, their effectiveness in cybersecurity has been limited. That may soon change thanks to a research team from NYU Tandon School of Engineering, NYU Abu Dhabi, and other universities that developed an AI agent capable of autonomously solving complex cybersecurity challenges. The system, called EnIGMA, was presented at the International Conference on Machine Learning (ICML) 2025 in Vancouver, Canada.

Development and Innovation

"EnIGMA is about using Large Language Model agents for cybersecurity applications," said Meet Udeshi, a NYU Tandon Ph.D. student and co-author of the research. Udeshi is advised by Ramesh Karri, Chair of NYU Tandon's Electrical and Computer Engineering Department, and Farshad Khorrami, ECE professor. Both are co-authors on the paper. The researchers started with an existing framework called SWE-agent, originally designed for software engineering tasks. Cybersecurity challenges required specialized tools that did not exist in previous AI systems. "We have to restructure those interfaces to feed it into an LLM properly. So we've done that for a couple of cybersecurity tools," Udeshi explained. The key innovation was developing "Interactive Agent Tools" that convert visual cybersecurity programs into text-based formats the AI can understand. Traditional cybersecurity tools like debuggers and network analyzers use graphical interfaces with clickable buttons and visual displays, which are not directly compatible with large language models (LLMs). "Large language models process text only, but these interactive tools with graphical user interfaces work differently, so we had to restructure those interfaces to work with LLMs," Udeshi said.

Dataset and Benchmarking

The team built their own dataset by collecting and structuring Capture The Flag (CTF) challenges specifically for large language models. CTFs are gamified cybersecurity competitions simulating real-world vulnerabilities and are traditionally used to train human cybersecurity professionals. "CTFs are like a gamified version of cybersecurity used in academic competitions. They're not true cybersecurity problems that you would face in the real world, but they are very good simulations," Udeshi noted. Minghao Shao, a NYU Tandon Ph.D. student and Global Ph.D. Fellow at NYU Abu Dhabi, described the technical architecture: "We built our own CTF benchmark dataset and created a specialized data loading system to feed these challenges into the model." The framework includes specialized prompts that provide the model with instructions tailored to cybersecurity scenarios.

Performance and Discoveries

EnIGMA demonstrated superior performance across multiple benchmarks. Tested on 390 CTF challenges across four different benchmarks, it achieved state-of-the-art results, solving more than three times as many challenges as previous AI agents. During the research conducted about 12 months ago, "Claude 3.5 Sonnet from Anthropic was the best model, and GPT-4o was second at that time," according to Udeshi. The research also identified a previously unknown phenomenon called "soliloquizing," where the AI model generates hallucinated observations without actually interacting with the environment. This discovery could have important consequences for AI safety and reliability.

Broader Implications

Beyond academic competitions, the potential applications of EnIGMA extend to real-world cybersecurity tasks. "If you think of an autonomous LLM agent that can solve these CTFs, that agent has substantial cybersecurity skills that you can use for other cybersecurity tasks as well," Udeshi explained. The agent could potentially be applied to real-world vulnerability assessment, autonomously trying hundreds of different approaches. For Udeshi, whose research focuses on industrial control system security, the framework opens new possibilities for securing robotic systems and industrial control systems. Shao sees potential applications beyond cybersecurity, including quantum code generation and chip design vulnerability detection.

Ethical Considerations

The researchers acknowledge the dual-use nature of their technology. While EnIGMA could help security professionals identify and patch vulnerabilities more efficiently, it could also potentially be misused for malicious purposes. The team has notified representatives from major AI companies, including Meta, Anthropic, and OpenAI, about their results.

Frequently Asked Questions (FAQ)

EnIGMA's Capabilities and Development

Q: What is EnIGMA?

A: EnIGMA is an AI agent developed by researchers that can autonomously solve complex cybersecurity challenges using text-based tools.

Q: What was EnIGMA developed from?

A: EnIGMA was developed from the SWE-agent framework, which was originally designed for software engineering tasks.

Q: What is the key innovation behind EnIGMA?

A: The key innovation is the development of "Interactive Agent Tools" that convert visual cybersecurity programs into text-based formats that LLMs can process.

Q: How did the researchers train EnIGMA?

A: They created a specialized dataset by collecting and structuring Capture The Flag (CTF) challenges tailored for large language models.

Q: What was the performance of EnIGMA compared to other AI agents?

A: EnIGMA achieved state-of-the-art results, solving more than three times as many CTF challenges as previous AI agents.

Q: What is "soliloquizing" in the context of AI agents?

A: "Soliloquizing" is a phenomenon where an AI model generates hallucinated observations without actually interacting with the environment.

Applications and Implications

Q: What are the potential real-world applications of EnIGMA?

A: Potential applications include real-world vulnerability assessment, securing robotic systems, and industrial control systems. It could also be used for quantum code generation and chip design vulnerability detection.

Q: What ethical considerations are associated with EnIGMA?

A: The technology has dual-use potential, meaning it could be used for both defensive cybersecurity and malicious purposes. The researchers have informed major AI companies about their findings.

Crypto Market AI's Take

cryptocurrency trading

AI agent autonomously solves complex cybersecurity challenges using text-based tools