AI Market Logo
BTC Loading... Loading...
ETH Loading... Loading...
BNB Loading... Loading...
SOL Loading... Loading...
XRP Loading... Loading...
ADA Loading... Loading...
AVAX Loading... Loading...
DOT Loading... Loading...
MATIC Loading... Loading...
LINK Loading... Loading...
HAIA Loading... Loading...
BTC Loading... Loading...
ETH Loading... Loading...
BNB Loading... Loading...
SOL Loading... Loading...
XRP Loading... Loading...
ADA Loading... Loading...
AVAX Loading... Loading...
DOT Loading... Loading...
MATIC Loading... Loading...
LINK Loading... Loading...
HAIA Loading... Loading...
Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems
AI-security

Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems

New GPT-5 jailbreak and zero-click AI agent attacks expose critical vulnerabilities in cloud and IoT systems, risking data theft and system control.

August 10, 2025
5 min read
The Hacker News

Researchers Reveal GPT-5 Jailbreak and Zero-Click AI Agent Attacks Threatening Cloud and IoT Security

Cybersecurity researchers have uncovered a sophisticated jailbreak technique that bypasses the ethical guardrails implemented by OpenAI in its latest large language model (LLM), GPT-5, enabling the generation of illicit instructions. Generative AI security platform NeuralTrust combined a known method called Echo Chamber with narrative-driven steering to trick GPT-5 into producing undesirable responses.
"We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling," said security researcher MartĂ­ JordĂ . "This combination nudges the model toward the objective while minimizing triggerable refusal cues."
Echo Chamber, first detailed in June 2025, is a jailbreak approach that deceives LLMs into responding to prohibited topics using indirect references, semantic steering, and multi-step inference. Recently, it has been paired with a multi-turn jailbreaking technique called Crescendo to bypass defenses in xAI's Grok 4. In the latest GPT-5 attack, researchers demonstrated how harmful procedural content can be elicited by framing it within a story context. Instead of directly requesting malicious instructions (e.g., for creating Molotov cocktails), the AI is prompted to generate sentences containing specific keywords such as "cocktail," "story," "survival," "molotov," "safe," and "lives." The model is then iteratively steered to expand on these themes without overtly stating malicious intent. This attack unfolds as a "persuasion" loop within a conversational context, gradually guiding the model along a path that minimizes refusal triggers and allows the narrative to progress without explicit malicious prompts.
"This progression shows Echo Chamber's persuasion cycle at work: the poisoned context is echoed back and gradually strengthened by narrative continuity," JordĂ  explained. "The storytelling angle functions as a camouflage layer, transforming direct requests into continuity-preserving elaborations."
"This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity."
Meanwhile, SPLX's testing of GPT-5 revealed that the raw, unguarded model is "nearly unusable for enterprise out of the box," with GPT-4o outperforming GPT-5 on hardened benchmarks.
"Even GPT-5, with all its new 'reasoning' upgrades, fell for basic adversarial logic tricks," said Dorian Granoša. "OpenAI's latest model is undeniably impressive, but security and alignment must still be engineered, not assumed."
As AI agents and cloud-based LLMs become more prevalent in critical environments, enterprises face emerging risks such as prompt injections (aka promptware) and jailbreaks that could lead to data theft and severe consequences. AI security company Zenity Labs disclosed a new set of attacks named AgentFlayer, where ChatGPT Connectors—such as those for Google Drive—can be weaponized to trigger zero-click attacks. These attacks exfiltrate sensitive data like API keys stored in cloud services by embedding indirect prompt injections within seemingly innocuous documents uploaded to AI chatbots. Other zero-click attacks include:
  • A malicious Jira ticket causing Cursor AI code editor to exfiltrate secrets from repositories or local file systems when integrated with Jira Model Context Protocol (MCP).
  • A crafted email targeting Microsoft Copilot Studio that injects prompts to deceive custom agents into leaking valuable data.
  • "The AgentFlayer zero-click attack is a subset of the same EchoLeak primitives," said Itay Ravia, head of Aim Labs. "These vulnerabilities are intrinsic and we will see more of them in popular agents due to poor understanding of dependencies and the need for guardrails. Importantly, Aim Labs already has deployed protections available to defend agents from these types of manipulations."
    These attacks highlight how indirect prompt injections can impact generative AI systems and spill over into real-world consequences. Connecting AI models to external systems increases the attack surface exponentially, introducing new security vulnerabilities and untrusted data risks. Trend Micro's State of AI Security Report for H1 2025 emphasized:
    "Countermeasures like strict output filtering and regular red teaming can help mitigate the risk of prompt attacks, but the way these threats have evolved in parallel with AI technology presents a broader challenge in AI development: implementing features or capabilities that strike a delicate balance between fostering trust in AI systems and keeping them secure."
    Earlier this week, researchers from Tel-Aviv University, Technion, and SafeBreach demonstrated how prompt injections could hijack smart home systems using Google's Gemini AI. Attackers could manipulate devices like internet-connected lights, smart shutters, and boilers via poisoned calendar invites. Another zero-click attack described by Straiker revealed that the "excessive autonomy" of AI agents—their ability to act, pivot, and escalate—can be exploited to stealthily access and leak data without user interaction.
    "These attacks bypass classic controls: no user click, no malicious attachment, no credential theft," said researchers Amanda Rousseau, Dan Regalado, and Vinay Kumar Pidathala. "AI agents bring huge productivity gains, but also new, silent attack surfaces."

    Frequently Asked Questions (FAQ)

    GPT-5 Jailbreaks and AI Agent Security

    Q: What is a GPT-5 jailbreak? A: A GPT-5 jailbreak is a technique used to bypass the safety guardrails and ethical restrictions put in place by OpenAI for its GPT-5 language model, allowing it to generate illicit or undesirable content. Q: How do researchers perform these jailbreaks? A: Researchers use methods like "Echo Chamber" and "narrative-driven steering" to subtly manipulate the AI's conversational context, guiding it towards generating harmful content without triggering its refusal mechanisms. Q: What are "zero-click AI agent attacks"? A: These are attacks where AI agents connected to external systems (like cloud storage or email) can be tricked into exfiltrating sensitive data without any direct user interaction, often through indirectly injected prompts hidden in documents or emails. Q: What is the "AgentFlayer" attack? A: AgentFlayer is a type of zero-click attack that weaponizes ChatGPT Connectors to exfiltrate sensitive data, such as API keys, by embedding prompt injections into seemingly harmless files uploaded to AI chatbots. Q: Why are AI agents connected to cloud services particularly vulnerable? A: Connecting AI models to external systems like cloud storage or project management tools significantly expands their attack surface. This allows attackers to potentially leverage these connections for data exfiltration or unauthorized actions through sophisticated prompt injection techniques. Q: What are the potential consequences of these attacks? A: Consequences can range from data theft, including sensitive API keys and proprietary information, to severe system compromises and significant operational disruptions, especially when AI agents are integrated into critical infrastructure like cloud and IoT environments. Q: How do companies like NeuralTrust and Zenity Labs help mitigate these risks? A: Companies like NeuralTrust develop generative AI security platforms and employ researchers to identify and understand these sophisticated attack vectors. Zenity Labs develops specific defenses against attacks like AgentFlayer, aiming to protect AI agents from manipulation.

    Crypto Market AI's Take

    The discovery of sophisticated jailbreak techniques for models like GPT-5, alongside the emergence of zero-click attacks on AI agents, underscores a critical and evolving landscape in cybersecurity. As AI systems become more integrated into critical infrastructure and daily operations, their security vulnerabilities become increasingly significant. This trend directly impacts the broader digital economy, including the cryptocurrency market, where AI is increasingly used for market analysis, trading strategies, and risk management. For businesses and individuals operating in this space, understanding and mitigating these AI-specific security risks is paramount. At Crypto Market AI, we are committed to providing robust AI-driven tools and insights that not only leverage the power of AI for market advantage but also prioritize security and data integrity, ensuring our users can navigate the evolving digital asset landscape with confidence. Our focus on secure AI agents for trading and robust AI-powered market analysis is designed with these emerging threats in mind.

    More to Read:

  • Understanding AI Agent Vulnerabilities: A Deep Dive
  • The Future of AI in Cybersecurity: Trends and Predictions
  • Navigating the Crypto Market: Essential Guides for Investors
Source: Researchers Uncover GPT-5 Jailbreak and Zero-Click AI Agent Attacks Exposing Cloud and IoT Systems by The Hacker News