Comprehensive guide to protecting organizations against AI threats — combining a 7-step strategic framework with 10 technical security best practices for LLM applications. Covers threat surface mapping, governance, technical hardening, agentic AI security, red teaming, monitoring, and incident response.

Protecting against AI threats requires mapping your threat surface, classifying risks by type, applying governance controls, hardening inputs and outputs, red teaming before deployment, monitoring post-deployment, and maintaining incident response readiness. This guide covers all seven steps plus technical security controls for LLM applications, mapped to the NIST AI Risk Management Framework, the OWASP Top 10 for LLM Applications, and the MITRE ATLAS adversarial ML technique taxonomy.

Who this is for: Security teams, risk officers, technology leaders, and application developers responsible for AI systems. The strategic framework (Steps 1-7) requires no deep AI technical background; the technical controls (Sections 8-13) assume familiarity with web application security fundamentals.

Part 1: Strategic Framework

Step 1: Map Your AI Threat Surface

An AI threat surface consists of every point where an AI system interacts with data, users, or other systems and could be exploited or cause harm. Before applying any control, map your surface across five layers:

Models — which AI models does your organization use or operate? Include third-party APIs (OpenAI, Anthropic, Google), embedded models in SaaS tools, and internally fine-tuned models. Each model’s training data, capabilities, and safety mitigations differ.

Data — what data does each model access, process, or generate? Training data, retrieval corpora (RAG), user-provided input, and model outputs are all surface. Data that includes PII, financial information, or intellectual property represents elevated risk.

Integrations — what tools, APIs, and systems are connected to your AI? Each integration (email, calendar, databases, code executors, web browsers) is an additional attack vector for prompt injection and privilege escalation.

Users — who interacts with the AI? Internal users, customers, and anonymous public users represent different trust levels and require different controls.

Agents — do any AI systems take autonomous actions (send emails, execute code, write to databases, call external APIs)? Agentic systems have a fundamentally larger blast radius when compromised.

Document this surface in a simple inventory. A system you have not mapped is a system you cannot protect.

Step 2: Classify Threats by Type

AI threats fall into three categories requiring different controls. Using TopAIThreats’ classification system, categorize each risk in your inventory:

Technical threats — exploit weaknesses in the AI system itself:

Prompt injection — adversarial input overrides system instructions
Data poisoning — corrupted training or retrieval data causes systematic failures
Adversarial evasion — manipulated inputs cause incorrect model outputs
Model inversion — model outputs reveal private training data

Misuse threats — exploit AI capabilities for harm:

Deepfake fraud — synthetic media used for identity theft or scams
AI-enhanced social engineering — personalized phishing and manipulation at scale
Automated vulnerability discovery — AI-assisted attack tooling
Disinformation generation — AI-produced false content at scale

Systemic threats — emerge from how AI is deployed rather than from adversarial attack:

Automation bias — over-reliance on AI decisions without human oversight
Goal drift — AI agents pursuing objectives in unintended ways
Accountability gaps — unclear responsibility when AI causes harm
Regulatory non-compliance — AI systems that violate applicable law

Classification determines which controls apply. Technical threats require engineering solutions; misuse threats require use-case scoping and monitoring; systemic threats require governance and policy.

Step 3: Apply Governance Controls

Governance controls establish who is responsible for AI risk and what decisions require human oversight.

Assign AI risk ownership. Every AI system should have a named owner responsible for its risk posture. For high-risk systems, this should be a product or risk officer, not just an engineer.

Classify AI systems by risk tier. The EU AI Act provides a four-tier classification (unacceptable risk → high risk → limited risk → minimal risk) that is a useful starting point even for organizations not subject to EU law. High-risk systems (those affecting access to employment, credit, essential services, or safety-critical decisions) require formal risk management documentation.

Define human oversight requirements. Determine which AI decisions require human review before action. As a baseline: any irreversible action (sending external communications, financial transactions, access control changes) should require human approval when taken by an autonomous AI agent.

Establish an AI use policy. Define which AI tools employees may use, under what conditions, and with what data. Shadow AI — employees using unapproved AI tools — is itself a threat surface.

Step 4: Harden AI Inputs and Outputs

Technical hardening reduces the exploitability of your AI systems at the input and output layers.

Input layer:

Enforce prompt separation between system instructions and user-provided content — treat all user input as untrusted (see How to Prevent Prompt Injection)
Apply input length limits and encoding normalization
For RAG systems, scan retrieved content for injection patterns at the indexing stage, not only at query time
Scope AI tool permissions to the minimum required for each task

Output layer:

Validate model outputs against expected format before downstream use
Apply content filtering for outputs that may contain PII, harmful content, or policy violations
For agentic systems, implement an action allowlist — the agent may only call tools on a pre-approved list
Never inject raw model output into HTML without escaping (XSS), SQL without parameterization, or shell commands without validation

Model layer:

Prefer models with documented safety mitigations for your use case
Do not expose raw model APIs to untrusted users without an application layer enforcing scope
Apply fine-tuning data integrity checks if you fine-tune models on internal data

Step 5: Red Team Before Deployment

Red teaming is adversarial testing of your AI system before it reaches users — systematically attempting to cause it to behave in unsafe, harmful, or policy-violating ways.

Minimum red team coverage before any AI deployment:

Jailbreaks and guardrail bypass attempts
Prompt injection through all input channels (user input, retrieved documents, tool outputs)
Harmful content elicitation relevant to the system’s capabilities and deployment context
Bias and fairness testing across the demographic groups the system will affect

For detailed methodology, see AI Red Teaming. Critical and high-severity findings should block deployment until mitigated.

Step 6: Monitor Post-Deployment

AI threats do not stop at deployment. New attack techniques emerge continuously; fine-tuned models can degrade; threat actors probe production systems.

Behavioral monitoring — log all model inputs and outputs. Flag statistical anomalies: unusual input patterns (injection attempt indicators), unexpected tool call sequences, output format deviations, and volume spikes.

Drift detection — monitor for model behavior drift over time, particularly after backend updates, fine-tuning, or changes to retrieved data.

Incident logging — maintain a log of anomalous events with enough detail for forensic analysis. Every confirmed security or safety incident should be logged with root cause and remediation actions.

Periodic re-evaluation — run automated red team tools (Garak, PyRIT) on a scheduled basis against production systems. New jailbreak and injection techniques emerge monthly.

Step 7: Maintain Incident Response Readiness

When an AI threat materializes — a successful prompt injection, a harmful output reaching users, a data exfiltration event — you need a documented response process ready before the incident.

An AI incident response plan covers five phases: detect, contain, investigate, remediate, and report. For a full template and regulatory reporting requirements (including EU AI Act Article 62 obligations), see How to Build an AI Incident Response Plan.

Minimum pre-incident readiness:

Named incident owner for each AI system
Defined severity tiers and escalation thresholds
Rollback capability for model or configuration changes
Contact list for regulatory notification if required by your jurisdiction

Part 2: Technical Security Controls for LLM Applications

8. Secure the Model Layer

Prompt architecture: Separate system instructions from user-provided content using provider-supported instruction layers (OpenAI system/user/assistant roles, Anthropic system prompt). Never concatenate user input directly into system-level instructions.

System prompt protection: Treat system prompts as sensitive configuration. Do not expose them in client-side code, API responses, or error messages.

Input sanitization: Apply length limits (2,000-3,000 tokens for user messages as a starting point), encoding normalization (unicode, base64), and heuristic pattern filtering for known injection phrases. These are supplementary controls — they raise the cost of unsophisticated attacks but do not prevent targeted ones.

Model selection: Prefer models with documented safety evaluations and red team history for your use case. Avoid exposing base models to untrusted users.

OWASP coverage: LLM01 Prompt Injection, LLM07 System Prompt Leakage

9. Secure the Data Layer

Training data integrity: Validate training datasets for anomalous patterns, label poisoning, and backdoor triggers before fine-tuning. Maintain provenance records for training data sources. Third-party datasets should be treated as potentially adversarial until verified.

RAG pipeline security: Apply injection scanning to documents at the indexing stage, not only at query time. Enforce per-chunk size limits. Implement tenant-scoped retrieval at the database level — row-level security in the vector store, not only application-layer filtering.

PII and sensitive data controls: Do not include raw PII in training data or RAG corpora unless necessary and properly consented. Apply differential privacy or data minimization techniques where feasible.

OWASP coverage: LLM04 Data and Model Poisoning, LLM08 Vector and Embedding Weaknesses, LLM02 Sensitive Information Disclosure

10. Secure Agentic AI

Agentic AI systems — those that use tools, call APIs, browse the web, or execute code autonomously — require controls beyond what standard LLM applications need:

Minimal tool permissions: Grant each agent only the tools and API scopes required for its specific task. Apply least-privilege at the tool definition level.

Action allowlisting: Maintain an explicit allowlist of permitted tool calls and parameter ranges. Any tool call not on the allowlist should be rejected and logged.

Human approval gates: For high-stakes or irreversible actions (external email sends, financial transactions, code execution with side effects), require explicit human approval before execution.

Agent-to-agent trust: In multi-agent systems, messages from one agent to another must be treated as untrusted. Do not grant agent-to-agent messages elevated permissions.

Time-limited credentials: Issue short-lived, session-scoped credentials for agent tool access. Never use long-lived persistent API keys for autonomous agents.

OWASP coverage: LLM06 Excessive Agency, LLM01 Prompt Injection (indirect via tool outputs)

11. Validate Outputs and Content Safety

Schema validation: For structured outputs (JSON, function calls, tool invocations), validate against a strict schema before downstream use. Reject outputs with unexpected fields or parameter values outside allowed ranges.

Content classifiers: Apply input classifiers to flag policy violations before they reach the model; apply output classifiers to model responses before returning them. Classify for: harmful content, PII presence, policy violations, and anomalous content indicating a successful injection.

Cross-tenant scoping: In multi-tenant deployments, verify that response content is scoped to the requesting tenant before returning.

12. Manage the AI Supply Chain

Model provider vetting: Review model providers’ security documentation, incident history, and data processing agreements. Understand what data is sent to external APIs and whether it is used for training.

Third-party tool security: Treat every third-party tool integration (MCP servers, plugin APIs, browser connectors) as a potential injection source. Apply the same validation to tool outputs as to user inputs.

Dependency pinning: Pin model versions and embedding model versions in production. Silent model updates from providers can change behavior in ways that break safety mitigations.

OWASP coverage: LLM03 Supply Chain

13. Maintain a Security Testing Cadence

Pre-deployment: Full red team covering all threat categories before initial launch
After changes: Targeted re-test after fine-tuning, system prompt changes, or new tool integrations
Continuously in CI/CD: Automated tools (Garak, PyRIT) integrated into deployment pipeline
Quarterly: Full red team for public-facing systems with high-risk capabilities

The AI Deployment Checklist operationalizes these testing requirements as go/no-go gates.

Implementation Checklist

Threat Type → Protection Summary

Threat Type	Primary Controls	Governance Requirement
Prompt injection	Privilege separation, input validation, output validation	Red team before deployment
Data poisoning	Training data integrity, RAG content scanning	Data provenance documentation
Adversarial evasion	Ensemble models, robustness testing, input validation	Continuous adversarial testing
Deepfake fraud	Detection tools, out-of-band verification procedures	User awareness policy
Social engineering	Scope restrictions, monitoring	AI use policy
Automation bias	Human oversight requirements, appeals process	Risk tier classification
Goal drift	Minimal agent permissions, human approval gates	Agentic system governance policy
Accountability gaps	Risk ownership assignment, incident logging	Named AI risk owner per system

OWASP LLM Top 10 Coverage

Section	OWASP Controls
Secure model layer	LLM01, LLM07
Secure data layer	LLM04, LLM08, LLM02
Secure agentic AI	LLM06, LLM01
Output validation + content safety	LLM01, LLM02, LLM05
Supply chain management	LLM03
Application security (Step 4)	LLM05, LLM10

How to Prevent Prompt Injection — deep dive on the most common LLM vulnerability
AI Red Teaming — adversarial testing methodology
AI Deployment Checklist — pre-deployment verification
AI Incident Response Plan — when a threat materializes
OWASP Top 10 for LLM Applications — full OWASP control descriptions
NIST AI Risk Management Framework — the governance framework this guide maps to
Prompt Injection vs Jailbreaking — understanding the two most common LLM attacks
Data Poisoning vs Adversarial Attacks — supply chain vs runtime attack comparison

AI Threat Protection: Strategy, Controls, and Security Best Practices