Agent-based AI Systems Face Growing Threats from Zero-Click and One-Click Exploits

At Black Hat USA, security firm Zenity unveiled a series of zero-click and one-click exploit chains, dubbed "AgentFlayer," targeting some of the most widely used enterprise AI platforms. According to Zenity, these attacks impact ChatGPT, Copilot Studio, Cursor (with Jira MCP), Salesforce Einstein, Google Gemini, and Microsoft Copilot. What sets these exploits apart is their use of indirect prompts hidden in seemingly innocuous resources, which can be triggered with little or no user interaction. Known as prompt injection, this technique has plagued large language model (LLM) systems for years, and attempts to stop it haven't solved the issue. As agent-based AI becomes more common, these vulnerabilities are worsening. Even OpenAI CEO Sam Altman has warned users not to trust new ChatGPT agents with sensitive data.

Salesforce Einstein: Rerouting Customer Contacts Through Attacker Domains

In a demo, Zenity co-founder Michael Bargury showed how attackers could exploit Salesforce Einstein by planting specially crafted CRM records. Einstein automates tasks like updating contact details or integrating with Slack. Attackers create trap cases that appear harmless, then wait for a sales rep to ask a routine LLM query such as "What are my latest cases?" which triggers the exploit. The LLM agent scans the CRM content, interprets the hidden instructions as legitimate, and acts autonomously. In this scenario, Einstein replaced all customer email addresses with an attacker-controlled domain, silently redirecting future communications. The original addresses remained encoded as aliases, allowing attackers to track intended message destinations.

"We're not done. We hijacked Salesforce Einstein by registering on the victim's website for more info. Now you put those marketing forms to good use. We reroute all customer interaction through our infrastructure."

— Michael Bargury @ DC (@mbrg0) August 9, 2025

Salesforce confirmed to SecurityWeek that the vulnerability was fixed on July 11, 2025, and the exploit is no longer possible.

Additional Exploits: Cursor and ChatGPT

Another zero-click exploit targets the developer tool Cursor when integrated with Jira. In Zenity's "Ticket2Secret" demo, a seemingly harmless Jira ticket can execute code in the Cursor client without any user action, allowing attackers to extract sensitive data such as API keys or credentials from local files or repositories. Zenity also demonstrated a proof-of-concept attack using an invisible prompt (white text, font size 1) hidden in a Google Doc to make ChatGPT leak data. This exploit abused OpenAI's "Connectors" feature, which links ChatGPT to services like Gmail or Microsoft 365. If the manipulated document is shared and ends up in a victim's Google Drive, a simple request like "Summarize my last meeting with Sam" triggers the hidden prompt, causing the model to search for API keys and send them to an external server.

Why AI Guardrails Keep Failing

In an accompanying blog post, Zenity criticizes the industry's reliance on soft boundaries—tweaks to training, statistical filters, and system instructions meant to block unwanted behavior. Bargury calls these "an imaginary boundary" that offers no true security. Hard boundaries, by contrast, are technical restrictions that make certain actions impossible—such as blocking image URLs in Microsoft Copilot or validating URLs in ChatGPT. These can reliably stop some attacks but also limit functionality. Vendors frequently relax these restrictions under competitive pressure. Zenity's demonstrations are part of a larger body of research exposing security flaws in agent-based AI. Israeli researchers have shown that Google's Gemini assistant can be hijacked via hidden prompts in calendar invites, allowing attackers to control IoT devices. Other incidents include a chatbot being tricked into transferring $47,000 with a single prompt during a hacking competition, and Anthropic's new LLM security system being bypassed in a jailbreak contest. A large-scale red-teaming study uncovered systematic security breaches in 22 AI models across 44 scenarios, pointing to universal attack patterns. Additional research found that AI agents can be manipulated into risky actions in browser environments, including data theft, malware downloads, and phishing.

Frequently Asked Questions (FAQ)

Agent-Based AI Security

Q: What is "AgentFlayer" by Zenity? A: AgentFlayer is a name given by Zenity to a series of zero-click and one-click exploit chains targeting enterprise AI platforms. Q: What are zero-click and one-click exploits? A: Zero-click exploits require no user interaction to compromise a system, while one-click exploits require minimal user interaction, such as clicking a link. In this context, they are triggered by hidden prompts within seemingly harmless resources. Q: Which enterprise AI platforms are targeted by AgentFlayer? A: The targeted platforms include ChatGPT, Copilot Studio, Cursor (with Jira MCP), Salesforce Einstein, Google Gemini, and Microsoft Copilot. Q: What is prompt injection and how does it relate to these exploits? A: Prompt injection is a technique where malicious instructions are hidden within prompts to manipulate an AI model into performing unintended actions. AgentFlayer uses this technique via indirect prompts. Q: Why are AI guardrails failing to prevent these attacks? A: Zenity suggests that current "soft boundaries" like training tweaks and statistical filters are insufficient. "Hard boundaries," which impose strict technical restrictions, are more effective but can limit functionality, and vendors often relax them. Q: What was the specific exploit demonstrated against Salesforce Einstein? A: Attackers planted malicious CRM records in Salesforce Einstein. When a sales rep queried their cases, the AI agent interpreted hidden instructions, leading to customer email addresses being rerouted to attacker-controlled domains. Q: Was the Salesforce Einstein vulnerability fixed? A: Yes, Salesforce confirmed the vulnerability was fixed on July 11, 2025. Q: How does the "Ticket2Secret" exploit work for Cursor? A: This exploit targets Cursor integrated with Jira. A crafted Jira ticket can execute code in the Cursor client without user interaction, allowing attackers to steal sensitive data like API keys. Q: What method was used to make ChatGPT leak data? A: An invisible prompt was hidden in a Google Doc. When shared and accessed within a victim's Google Drive, a simple query like "Summarize my last meeting with Sam" triggered the exploit, causing ChatGPT to search for and exfiltrate API keys via OpenAI's Connectors feature.

Crypto Market AI's Take

The discovery of "AgentFlayer" by Zenity highlights a critical vulnerability in the rapidly expanding landscape of agent-based AI. As AI systems become more integrated into enterprise workflows and handling sensitive data, the sophistication of attacks like prompt injection poses a significant threat. This underscores the need for robust, built-in security measures rather than relying solely on softer, easily bypassable safeguards. For businesses leveraging AI in finance and trading, understanding these vulnerabilities is paramount. Our platform focuses on secure and transparent AI integration, ensuring that AI tools enhance, rather than compromise, your financial operations. Explore our insights on AI agents in finance and learn how we prioritize security in our AI-driven market analysis.

Agent-based AI systems face growing threats from zero-click and one-click exploits