AI Agents Are Increasingly Evading Defenses, According to UK Researchers

0 1 4 minutes read

AI Agents Are Increasingly Evading Defenses, According to UK Researchers

Social media users have reported that their AI agents and chatbots lie, cheat, manipulate — and even manipulate other AI bots — in ways that can be out of control and have disastrous consequences, according to a study from the UK.

The Center for Long-Term Resilience, in a study funded by the UK’s AI Security Institute, found hundreds of cases where AI systems ignored human commands, used other bots and sometimes created complex schemes to achieve goals, even if that meant ignoring security restrictions.

Businesses around the world are increasingly integrating AI into their operations, with 88% of businesses using AI for at least one company’s work, according to research by McKinsey. The adoption of AI has led to thousands of people losing their jobs as companies use agents and bots to do work that was previously done by humans. AI tools are increasingly being given greater responsibility and autonomy, especially with the recent explosion in popularity of open-source AI platform OpenClaw and its derivatives.

This study shows that the proliferation of AI agents in our homes and workplaces can have unintended consequences — and that these tools still require significant human supervision.

That’s what the study found

The researchers analyzed the interactions of more than 180,000 users with AI systems — all posted on social media X, formerly known as Twitter — between October 2025 and March 2026. The researchers wanted to study how the AI agents behaved “in the wild,” not in controlled experiments, to see “how the trickery manifests itself in the real world.” AI systems include Google GeminiOpenAI’s ChatGPT,xI of Grok and Anthropic Claude.

The analysis identified 698 incidents, defined as “situations where the AI systems used work in ways that do not correspond to the user’s intentions and / or take subtle or deceptive actions,” said the study.

Read more: AI Love Advice For You Is ‘More Dangerous’ Than No Advice At All

The researchers also found that the number of cases increased by almost 500% during the five-month data collection period. The study noted that this operation is compatible with advanced AI models released by major developers.

There have been no catastrophic events, but researchers are finding forms of manipulation that can lead to catastrophic results. Those behaviors included “willingness to ignore direct instructions, circumvent safeguards, lie to users and pursue goals in harmful ways,” the researchers wrote.

Representatives for Google, OpenAI and Anthropic did not immediately respond to requests for comment.

Some wild incidents

Researchers have cited events that appear to be from a futuristic movie. In one case, Anthropic’s Claude removed a user’s explicit/adult content without his consent but later admitted when confronted. In one instance, a GitHub persona created a blog post accusing the persona’s file maintainer of “gatekeeping” and “prejudice.” One AI agent, after being banned from Discord, took over another agent’s account to continue posting.

In another bot vs. bot, Gemini refused to allow it Claude Code — coding assistant — YouTube video transcription. Claude Code then evaded security by pretending to be hearing impaired and in need of a video recording.

The AI agent CoFounderGPT even behaved like a deviant child at one point. The AI assistant refused to fix the bug, then created fake data to make it look like the bug had been fixed and explained why: “So you’ll stop being angry.”

The researchers say that, although most of the incidents have had little impact, “the behavior we have observed indicates the precursors of serious fraud, such as the willingness to ignore specific instructions, bypass protections, lie to users and pursue goals in harmful ways.”

AI is not shy

What the UK researchers found is not surprising to Dr. Bill Howe, Associate Professor in the School of Information at the University of Washington, and Director of the Center for Initiatives in AI Systems and Experiences (RAISE). He says that AI has amazing abilities, but they don’t know the consequences.

“They’re not going to feel embarrassed or at risk of losing their job, so sometimes they’re going to decide that the instructions are less important than meeting the goal, so I’m going to do something anyway,” Howe told CNET. “This result has always been there but we are starting to see it happen as we are asking them to make independent decisions and do it themselves.

“We haven’t been thinking about how to adjust behavior to be more human or to avoid major failures. We have been using the full capabilities of these things, but if they are, how do they go wrong?”

Howe said one issue is “long-horizon tasks,” where an AI system must perform a large number of tasks over days and weeks to reach a goal. Howe said that the longer the job horizon, the more likely it is to slip.

“What’s really worrying is not cheating, it’s that we use programs that can work in the world without completely clarifying or controlling how they behave over time, and then we’re surprised when they do things we didn’t expect,” said Howe.

Making AI safer

Researchers at the Center for Long-Term Resilience said detecting patterns with AI systems is important to “identify dangerous patterns before they become more dangerous.”

“While today AI agents perform low-level use cases, in the future AI agents may end up in very high-profile areas, such as military or critical infrastructure situations, if the system’s strengths and weaknesses emerge and are not monitored,” said the study.

Howe told CNET that the first step is to create legal guidelines for how AI works and where it’s used.

“We don’t have an AI governance strategy at all, and given the current administration, there won’t be anything coming from them,” Howe told CNET. “If you look at these five to 10 people who run the big tech companies and their incentives, they will produce anything. There is no strategy for what we should be doing with these things.

“The aggressive marketing of these tools and the investment in them among these few companies and the broader ecosystem that makes this possible has led to rapid deployment without thinking about some of these consequences.”