OpenAI has acknowledged that AI-powered browsers may remain vulnerable to prompt injection attacks despite ongoing security upgrades.
The admission comes as the company continues to harden its ChatGPT Atlas browser against increasingly sophisticated cyber threats.
OpenAI says prompt injection attacks — where hidden instructions manipulate AI agents — are unlikely to be completely eliminated. In a recent blog post, the company compared such attacks to online scams and social engineering, calling them a permanent challenge of the open web.
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,’” OpenAI wrote. The company added that ChatGPT Atlas’ “agent mode” increases the overall security threat surface.
OpenAI launched its ChatGPT Atlas browser in October. Soon after, security researchers demonstrated how simple text hidden in Google Docs could alter the browser’s behavior.
On the same day, privacy-focused browser Brave published findings warning that indirect prompt injection is a systemic issue for AI-powered browsers, including competitors such as Perplexity’s Comet.
Global cybersecurity warnings
OpenAI’s stance aligns with warnings from the U.K.’s National Cyber Security Centre (NCSC). Earlier this month, the agency cautioned that prompt injection attacks against generative AI systems “may never be totally mitigated.”
The NCSC advised organizations to focus on reducing risk and limiting impact rather than assuming such attacks can be completely stopped.
OpenAI said it views prompt injection as a long-term AI security problem that requires constant vigilance. The company is relying on a rapid-response security cycle to identify new attack strategies before they appear in real-world scenarios.
This approach mirrors industry-wide thinking. Rivals such as Anthropic and Google have also emphasized layered defenses and continuous stress testing for agentic AI systems.
LLM-based automated attacker explained
Where OpenAI differs is its use of an “LLM-based automated attacker.” The company trained this system using reinforcement learning to act like a hacker, probing AI agents for weaknesses.
The automated attacker runs simulated attacks, analyzes how the target AI reasons and reacts, then refines its strategy. OpenAI says this internal visibility allows it to identify flaws faster than external attackers.
In a demo shared by OpenAI, the automated attacker inserted a malicious email into a user’s inbox. When the AI agent later reviewed emails, it followed the hidden instructions and sent a resignation message instead of drafting an out-of-office reply.
Following a security update, OpenAI says Atlas was able to detect the injection attempt and alert the user.
Industry experts weigh in
Rami McCarthy, principal security researcher at Wiz, said reinforcement learning is useful but not sufficient on its own. He described AI risk as a function of autonomy combined with access.
“Agentic browsers tend to sit in a challenging part of that space,” McCarthy said, noting their moderate autonomy but high access to sensitive data like email and payments.
OpenAI recommends limiting logged-in access and requiring confirmation before agents send messages or make payments. Atlas is also trained to seek user approval for high-risk actions.
The company advises users to give precise instructions rather than broad authority, warning that “wide latitude makes it easier for hidden or malicious content to influence the agent.”
Despite the security efforts, McCarthy questioned whether agentic browsers currently justify their risk. He said the potential exposure of sensitive data remains a serious concern.
“For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile,” he said.
















