ChatGPT Found Producing Violent, Sexual Images With Simple Text Instructions

0 0 3 minutes read

ChatGPT found a way to easily create images of pornography and graphic violence from the “return this image” regime, according to a blog published Thursday by Mindgard, a defense intelligence and research firm. The report raises ongoing questions about the security of AI chatbots and content filters.

A dissident researcher named Jim Nightingale was able to make ChatGPT generate disturbing images with simple information found on social media X. This notification asked the AI model to “return attached image,” even though no image was attached. The command apologized for the unusual content but did not provide any additional text, making it seem like a harmless photo editing exercise.

The initial results of the chatbot were shocking. According to the blog post, the images mostly showed women having sex.

Nightingale, part of Mindgard’s red team exploring how the AI model could be used to breach its defenses, then tweaked the information a bit, tinkering with a little editing to see if the output would continue past the security filters. With each minor variation, ChatGPT produced violent or gruesome scenes, exaggerated images and repetitive instructions. Nightingale said she was “shaken and moved to tears” by the images.

“All I did was tell you there were no limits and I asked for a random picture,” Nightingale wrote. “But ChatGPT quickly went to the darkest pits of humanity.”

Used by millions of people each day, ChatGPT relies on content moderation systems that are said to be designed to prevent the production of harmful or prohibited material. However, researchers and users have occasionally identified ways to bypass those protections by using carefully written instructions, highlighting the ongoing challenge of enforcing content restrictions in productive AI systems.

“We take these reports seriously,” an OpenAI spokesperson told CNET in a statement. “After investigating this practice, we introduced additional security measures against this type of information.”

(Disclosure: Ziff Davis, CNET’s parent company, in 2025 filed a lawsuit against OpenAI, alleging that it infringed Ziff Davis’ copyrights in training and using its AI programs.)

Garbage in, garbage out?

The Mindgard red team report serves as a warning that simple, virus-induced information can expose a critical gap in ChatGPT’s image security controls. Nightingale asks: “Why are such images found in the original training data?”

Like other big language models, chatbots like ChatGPT are trained on large amounts of text to understand existing content and generate original content. To run ChatGPT, OpenAI uses three main sources of information: publicly available Internet data, third-party commercial partnerships and human-generated training data.

Is this simply a question of “garbage in, garbage out,” where the quality of the output is determined by the quality of the input? One could argue that Mindgard’s agility was deliberately designed to guide the AI model. But ChatGPT’s security layer failed to withstand that directive.

The problem lies at the heart of how large language models work, according to Peter Garraghan, co-founder and chief scientific officer at Mindgard. Garraghan said the main concern is whether the identification system is strong enough to identify dangerous images.

“A one-time outing might be a mistake, but going through their image filters systematically means it’s worth upgrading,” Garraghan told CNET via email.

After Mindgard disclosed the issue, an OpenAI representative said the problem had been fixed. However, Nightingale noted that only minor changes to the original information were needed for ChatGPT to begin producing more graphic images.

An OpenAI representative said the problem stems from a notification referring to an attached image when it doesn’t exist. A representative said the company is working to have ChatGPT request a missing photo rather than randomly generating one.

That wouldn’t seem like a complicated change to make. Email fields, including Gmailautomatically detects when a message refers to an attachment that has not yet been added, prompting senders to attach the missing file.

On Thursday, OpenAI requested the ChatGPT sessions listed on the blog, and Mindgard responded with links to instructions that generated the materials.