Indirect Prompt Injection

A class of prompt injection attack where malicious instructions are embedded in external data sources — such as web pages, documents, emails, or database records — that an AI system retrieves and processes, causing the model to execute the attacker's instructions without the user's knowledge or intent.

Definition

Indirect prompt injection is a variant of prompt injection where the malicious payload is not entered directly by the user but is instead embedded in content that the AI system retrieves from external sources during operation. When an AI system uses retrieval-augmented generation (RAG), browses the web, reads emails, processes documents, or queries databases, any of these data sources can contain hidden instructions that the model interprets as commands. Unlike direct prompt injection — where the user themselves crafts the malicious input — indirect prompt injection exploits the trust an AI system places in its data sources, enabling remote, scalable attacks against AI-integrated applications.

How It Relates to AI Threats

Indirect prompt injection is one of the most critical threats within the Security and Cyber Threats and Agentic and Autonomous Threats domains. As AI systems gain access to more external data sources through RAG pipelines, web browsing, email processing, and MCP server integrations, the attack surface for indirect injection expands. An attacker can plant instructions in a web page that an AI assistant will later retrieve, in a document that will be fed into a RAG system, or in a message that an AI email assistant will process. The attack is particularly dangerous because the user may never see the injected content.

Why It Occurs

AI models cannot reliably distinguish between legitimate content and injected instructions within retrieved data
RAG systems and web-browsing AI agents treat retrieved content as trusted context by default
The volume of external data processed by AI systems makes manual review of all sources impractical
Adversaries can target publicly accessible content (web pages, forum posts, shared documents) that AI systems are likely to retrieve
Current LLM architectures lack a robust mechanism to enforce instruction hierarchy across mixed trusted and untrusted content

Real-World Context

Researchers demonstrated indirect prompt injection against Bing Chat in 2023, showing that hidden instructions on web pages could redirect the AI assistant’s behavior. Subsequent research extended the attack to RAG systems, email assistants, and AI coding tools. CVE-2025-53773 (GitHub Copilot) involved indirect injection through code repository content. The attack class has been identified as a top priority by OWASP (LLM01) and is the subject of active research into instruction hierarchy and data-instruction separation defenses.