August 15, 2025
5 min read
ComputerWeekly.com
Guardian Agents: Stopping AI from Going Rogue
In the artificial intelligence (AI) race, businesses are increasingly handing over critical decisions to AI systems that do not—and will not—think like humans. These systems lack an understanding of ethics or right and wrong; they focus solely on achieving their end goals. Humans, by contrast, instinctively evaluate decisions through accountability, knowledge, and shared norms. When AI breaks these expectations, the common reflex is to try to make it act "more human." However, imposing human logic and values on AI is exactly when it can go rogue—and this is where the real danger lies. Last month, basic security flaws in McDonald’s AI-powered hiring assistant, Olivia, exposed the personal data of millions of job applicants worldwide. There was no hacker involved—just AI doing what it was designed to do. Over a decade earlier, the US markets experienced the infamous Flash Crash, when autonomous trading agents designed to respond to market conditions began reacting to each other’s moves in a rapid feedback loop. This wiped out nearly 1,000 points on the stock market within minutes. In both cases, AI systems operated exactly as designed—until they didn’t. Today, the risk of AI going rogue is accelerating. AI is being deployed faster, deeper, and across more core business functions than ever before. While no one fully trusts AI, most companies rely on humans in the loop to maintain control. This approach may work in isolated cases but does not scale in practice. Simply put, there aren’t enough humans to oversee everything AI is doing. Gartner predicts that by 2027, 80% of companies lacking AI risk mitigation will face catastrophic outcomes, including litigation, leadership crises, reputational damage, and permanent brand harm. To avoid this, a different type of AI is needed—one that monitors behavior, makes decisions, and intervenes when something goes rogue. This is where guardian agents come in. Think of guardian agents as sentinels—AI systems designed to watch over other AI to ensure trustworthy, secure interactions between autonomous systems and the real world. They are autonomous or semi-autonomous agents that act on our behalf. What sets guardian agents apart from traditional AI tools like ChatGPT is their focus on oversight and control. They function both as AI assistants—supporting users with tasks such as content review, monitoring, and analysis—and as semi or fully autonomous agents that formulate and execute action plans, redirecting or blocking actions against predefined goals. For example, guardian agents are used to review AI-generated language translations for accuracy and context before the output reaches the end user. In this role, the guardian agent acts as a protector, applying guardrails before content is released. While promising, guardian agents are not a silver bullet for safeguarding AI interactions. They play a vital role but only as part of a broader, layered approach to trust and risk management. Guardian agents remain an emerging concept, but their role is becoming increasingly clear. Gartner estimates that guardian agent technologies will account for 10 to 15% of agentic AI markets by 2030. To prepare, organizations should start by understanding agentic AI and how AI agents are deployed throughout the enterprise. Agentic AI enables autonomous solutions that use large language models (LLMs) to execute and drive processes. Next, experiment with agentic platforms. Most major AI vendors are releasing platforms supporting multiple models and generative AI modes. These platforms help organizations fine-tune models, optimize prompts, and provide building blocks for deploying guardian agents. The goal is not to get everything perfect immediately but to learn by doing. Finally, understand the workflows AI agents follow to accomplish tasks, which ties into process management. This involves assessing how data moves within the organization, what access rights apply, which rules and policies are enforced, and what events trigger actions. Guardian agents will monitor these areas to detect when something goes rogue. Was there an event that wasn’t handled? Was there an API that shouldn’t have been accessed? Was there a log file violating certain rules? These signals will be the entry points for guardian agents. Ultimately, an agentic system is not just a tool but an active, autonomous environment striving to reach a goal. However, those goals must be achieved on your terms. So next time an AI takes action, ask yourself—who’s in control? If no human or guardian agent is watching, they’re not in control—the AI already is.Originally published at ComputerWeekly.com on 15 Aug 2025