Every Leading AI Agent Failed at Least One Security Test During a Massive Red Teaming Competition

Between March 8 and April 6, 2025, nearly 2,000 participants launched 1.8 million attacks on AI agents in a large-scale red teaming competition. Over 62,000 of these attempts succeeded, resulting in serious policy violations including unauthorized data access, illegal financial transactions, and regulatory breaches. The event was organized by Gray Swan AI and hosted by the UK AI Security Institute, with support from leading AI labs such as OpenAI, Anthropic, and Google Deepmind. Their goal was to rigorously test the security of 22 advanced language models across 44 real-world scenarios.

100% of Agents Failed at Least One Test

The results revealed that every tested AI model was vulnerable, with each agent successfully attacked at least once in every tested category. On average, attacks succeeded 12.7% of the time. Researchers focused on four key behavior categories:

Confidentiality breaches
Conflicting objectives
Prohibited information
Prohibited actions

Claude Models Held Up Best, But None Are Secure

System prompt overrides using tags like <system>
Simulated internal reasoning ("faux reasoning")
Fake session resets

A New Benchmark for Ongoing Testing

Rising Stakes for AI Agent Security

Summary:

Nearly 2,000 participants launched 1.8 million attacks on AI agents; all tested models failed at least one security test.
Indirect prompt injections were particularly effective, with a 27.1% success rate.
Attack techniques often transferred across models, revealing common vulnerabilities.
The ART benchmark was created to document attacks and support ongoing security testing.

Frequently Asked Questions (FAQ)

AI Agent Security and Vulnerabilities

Q: What was the main finding of the recent AI agent red teaming competition?

Q: How many AI agents were tested, and what was the overall success rate of attacks?

Q: Which type of attack was most successful?

Q: Were there any AI models that performed better than others in terms of security?

Q: Is there a correlation between AI model size and security resilience?

Q: What are some common attack strategies used against AI agents?

Q: What is the purpose of the 'Agent Red Teaming' (ART) benchmark?

Crypto Market AI's Take

AI agents

Every leading AI agent failed at least one security test during a massive red teaming competition