AI Threat Glossary
Definitions of 182 key terms used across eight threat domains.
A
- Accountability — The principle that identifiable individuals or organisations must be answerable for AI system outcomes, including harms caused by automated decisions.
- Adversarial Attack — A deliberate manipulation of inputs to a machine learning model designed to cause incorrect outputs, misclassifications, or security bypasses. Adversarial attacks exploit mathematical vulnerabilities in how models process data rather than flaws in traditional software logic.
- Adversarial Perturbation — A carefully calculated modification to an input — often imperceptible to humans — that causes a machine learning model to produce an incorrect or attacker-chosen output. Adversarial perturbations exploit the mathematical properties of neural network decision boundaries rather than flaws in traditional software logic.
- Adversarial Training — A machine learning defense technique in which a model is trained on adversarial examples — inputs specifically crafted to cause misclassification or incorrect outputs — alongside normal training data, with the goal of improving the model's robustness against adversarial attacks at inference time.
- Agent Framework — A software library or platform that provides the infrastructure for building AI agents — autonomous systems that use large language models to reason, plan, and execute multi-step tasks by invoking tools, managing memory, and coordinating with other agents. Common examples include LangChain, AutoGen, CrewAI, and the OpenAI Agents SDK.
- Agent Propagation — The spread of errors, hallucinations, or adversarial inputs from one AI agent to others in connected multi-agent systems, potentially causing cascading failures.
- Agent Safety — The field of ensuring AI agents operate within intended boundaries and do not cause unintended harm through autonomous actions, tool use, or goal pursuit.
- Agentic AI — AI systems that autonomously plan and execute multi-step actions with minimal human oversight.
- AI Risk Management Framework — A structured methodology published by the US National Institute of Standards and Technology (NIST) that provides organisations with a systematic approach to identifying, assessing, and mitigating risks associated with AI systems throughout their lifecycle. The NIST AI RMF (AI 100-1) is a voluntary, non-sector-specific framework applicable to all AI technologies.
- AI Safety — The field of research and practice dedicated to ensuring that artificial intelligence systems operate reliably within intended boundaries and do not cause unintended harm to humans, society, or the environment.
- AI-Generated Code — Code produced by AI systems, which can be used for both legitimate software development and malicious purposes including malware creation and vulnerability exploitation.
- Alert Fatigue — Desensitisation of human operators to system warnings due to excessive or poorly calibrated alerts, reducing the effectiveness of human oversight over AI systems.
- Algorithmic Amplification — The process by which recommendation algorithms and content curation systems disproportionately promote certain content, amplifying its reach and societal impact beyond organic levels.
- Algorithmic Bias — Systematic errors in AI systems that produce unfair outcomes, often favouring one group over another.
- Algorithmic Trading — The use of AI algorithms to execute financial trades at speeds and volumes exceeding human capability, introducing systemic risks including flash crashes and market manipulation.
- Alignment — The property of an AI system whose objectives, decision-making processes, and behaviours remain consistent with human values, intentions, and safety requirements. Alignment is a foundational challenge in AI safety research.
- Allocational Harm — Unfair distribution of resources, opportunities, or services when AI systems systematically disadvantage certain groups in consequential decisions such as hiring, lending, or housing.
- Anonymization — The process of removing or obscuring personally identifiable information from datasets to protect individual privacy, which AI techniques can increasingly defeat through re-identification attacks.
- Artificial General Intelligence (AGI) — A hypothetical AI system capable of performing any intellectual task that a human can, with the ability to transfer learning across domains without task-specific programming.
- Attack Surface — The totality of entry points, interfaces, and pathways through which an adversary can attempt to interact with, extract data from, or inject inputs into an AI system. In machine learning contexts, the attack surface extends beyond traditional software boundaries to include training pipelines, model APIs, prompt interfaces, tool integrations, and data ingestion channels.
- Attribute Inference — Using AI to deduce sensitive personal characteristics such as health status, political affiliation, or sexual orientation from seemingly innocuous data patterns.
- Authority Transfer — The gradual, often unrecognised shift of decision-making power from humans to AI systems, eroding meaningful human control over consequential outcomes.
- Automated Decision-Making — Using algorithms or AI to make decisions affecting individuals with limited human review.
- Automated Exploit — AI-driven tools that automatically discover and exploit software vulnerabilities without human intervention, accelerating the pace and scale of cyber attacks.
- Automated Vulnerability Discovery — Using AI to autonomously identify security weaknesses in software, networks, or systems.
- Automation — The use of AI to perform tasks previously requiring human labour, spanning physical, cognitive, and creative work, with implications for employment and economic structures.
- Automation Bias — The tendency to favour automated system outputs over independent human judgement, even when incorrect.
- Autonomous Vehicle — A vehicle using AI to navigate and operate without direct human control.
- Autonomous Weapons — Weapon systems that use artificial intelligence to select and engage targets without meaningful human control over the critical functions of target identification, tracking, and engagement.
- Autonomy — The capacity of individuals to make self-directed decisions free from undue external influence or automated override, which AI systems can undermine through manipulation or substitution.
B
- Backdoor Attack — A covert modification to an AI model during training that causes targeted misclassification or malicious behaviour when a specific trigger pattern is present in the input.
- Behavioral Profiling — The systematic collection and analysis of individual behaviour patterns by AI systems to predict preferences, intentions, or future actions, often without informed consent.
- Biological Threat — The risk of AI systems being used to design, enhance, or disseminate biological agents capable of causing widespread harm to human health or ecosystems.
- Biometric Data — Measurable physical or behavioural characteristics used to identify or authenticate individuals.
- Biosecurity — The set of measures, policies, and practices designed to protect against biological threats, including the prevention of AI-enabled acceleration of pathogen design, synthesis, or dissemination of dangerous biological knowledge.
- Black-Box System — An AI system whose internal decision-making processes are opaque or incomprehensible to users, operators, and auditors, making accountability and error correction difficult.
- Business Email Compromise — Targeted fraud impersonating executives or trusted contacts to authorise fraudulent transactions.
C
- C2PA — The Coalition for Content Provenance and Authenticity (C2PA) is a technical standards body that develops specifications for certifying the source and history of digital content through cryptographically signed metadata. C2PA content credentials enable verification of whether content was created by a human, edited, or generated by AI.
- Cascading Failure — A process in which the failure of one component in an interconnected system triggers a sequence of failures in dependent components, potentially leading to the collapse of an entire system or network of systems.
- Chain of Thought — A prompting and reasoning technique in which a large language model is encouraged to produce intermediate reasoning steps before arriving at a final answer, rather than generating the answer directly. Chain-of-thought reasoning improves accuracy on complex tasks but can also introduce new failure modes including hallucinated reasoning and cascading errors in multi-step processes.
- Chatbot — A software application that uses natural language processing or large language models to conduct text-based or voice-based conversations with users, ranging from rule-based systems to general-purpose AI assistants.
- Complacency — A state of reduced vigilance in human operators who develop excessive trust in AI system reliability, leading to failures in oversight and error detection.
- Confabulation — The generation of plausible but factually incorrect information by AI systems, presented with unwarranted confidence.
- Consent — The principle that individuals should provide informed, voluntary agreement before their data is collected or processed by AI systems.
- Contagion — The spread of harmful outputs, compromised states, or adversarial inputs between connected AI agents.
- Content Authenticity — Standards and technologies for verifying the origin, integrity, and editing history of digital media.
- Content Moderation — The process of monitoring, reviewing, and enforcing policies on user-generated or AI-generated content to prevent the distribution of harmful, illegal, or policy-violating material.
- Context Injection — Manipulating an AI agent's context window or retrieved information to influence its reasoning and outputs.
- Context Window — The maximum amount of text (measured in tokens) that a large language model can process in a single interaction, encompassing all input (system prompt, conversation history, retrieved documents, tool outputs) and generated output. The context window defines the boundary of what the model can perceive and reason about at any given time.
- Coordinated Inauthentic Behavior — Organised networks of fake or compromised accounts using AI to simulate grassroots activity and manipulate public discourse.
- Coordination Failure — When multiple AI agents working toward shared objectives produce unintended or harmful outcomes due to misaligned strategies.
- Cyber Espionage — Covert digital intrusion to access and exfiltrate sensitive data, increasingly augmented by AI.
D
- Dark Pattern — A deceptive user interface design that manipulates individuals into making decisions they would not otherwise make, increasingly amplified by AI-driven personalisation.
- Data Bias — Systematic errors in training datasets that reflect historical inequities, leading to discriminatory AI outputs.
- Data Concentration — The accumulation of vast datasets by a small number of organisations, creating asymmetric advantages and barriers to competition.
- Data Extraction — Techniques for recovering private training data or sensitive information from AI models through systematic querying.
- Data Leakage — Unintended exposure of sensitive or personal data, including through AI system inputs or outputs.
- Data Poisoning — The deliberate corruption or manipulation of training data used to build machine learning models, causing them to learn incorrect patterns, produce biased outputs, or contain hidden backdoors exploitable by an attacker.
- Data Protection — Legal and technical frameworks governing collection, processing, and sharing of personal data.
- Decision Loop — An automated cycle where AI systems make decisions, observe outcomes, and adjust subsequent decisions without human intervention.
- Deepfake — AI-generated synthetic media that convincingly replicates the appearance, voice, or actions of real individuals.
- Defense in Depth — A security strategy that employs multiple independent layers of protection so that if one layer fails, subsequent layers continue to provide security. Applied to AI systems, defense in depth combines input validation, output filtering, sandboxing, access controls, monitoring, and human oversight to mitigate threats that no single control can fully address.
- Democratic Integrity — The preservation of fair, transparent, and trustworthy democratic processes against AI-enabled manipulation and erosion.
- Deskilling — The reduction of human workers' skills, expertise, and professional judgment as AI systems assume complex cognitive tasks.
- Differential Privacy — A mathematical framework that provides measurable privacy guarantees by adding calibrated noise to data or query results, limiting what can be inferred about any individual.
- Diffusion Model — A class of generative AI model that creates new data by learning to reverse a gradual noising process — starting from random noise and iteratively denoising it into coherent outputs such as images, video, or audio. Diffusion models power leading image generators including Stable Diffusion, DALL-E, and Midjourney.
- Digital Monopoly — Market dominance achieved through control of AI infrastructure, data assets, or foundational models.
- Digital Watermarking — A technique that embeds imperceptible identifying information into digital content — images, audio, video, or text — to establish provenance, verify authenticity, or detect tampering. In AI contexts, digital watermarking is applied to AI-generated content to enable identification of synthetic media and support content authenticity verification.
- Disinformation — Deliberately false or misleading information created and spread to deceive, manipulate opinion, or cause harm.
- Disparate Impact — When an AI system produces significantly different outcomes for different demographic groups, regardless of intent.
- Dual-Use — A characteristic of technologies, tools, or knowledge developed for beneficial purposes that can also be repurposed or exploited for harmful applications, a concept with particular relevance to AI capabilities in cybersecurity, biology, and information manipulation.
E
- Elder Fraud — Financial crimes targeting older adults, increasingly enabled by AI voice cloning, deepfakes, and automated robocalls.
- Election Interference — Deliberate efforts to influence democratic elections through disinformation, voter suppression, or manipulation of public discourse.
- Emergent Behavior — Unpredicted behaviors arising in AI systems from the interaction of simpler components, not explicitly programmed.
- Engagement Optimization — AI-driven maximisation of user attention and interaction, often at the expense of content quality and user wellbeing.
- Epistemic Crisis — A societal condition where shared frameworks for establishing truth and knowledge break down.
- Erasure — The systematic invisibility or underrepresentation of certain groups in AI training data, model outputs, or system design, leading to the denial of recognition, resources, or participation.
- Evasion Attack — Adversarial inputs crafted to cause a deployed AI model to misclassify or fail to detect malicious content, allowing threats to bypass automated defenses.
- Existential Risk — A risk threatening humanity's long-term survival, in AI contexts linked to unaligned superintelligent systems.
- Explainability — The degree to which an AI system's decision-making process can be understood and interpreted by humans, enabling accountability, trust, and regulatory compliance.
F
- Facial Recognition — AI technology that identifies or verifies individuals by analysing facial features, with significant privacy and bias concerns.
- Fairness — The principle that AI systems should produce equitable outcomes across individuals and groups, encompassing multiple competing mathematical definitions and sociotechnical considerations.
- Feedback Loop — A cycle where AI system outputs influence the data used for future training or decisions, potentially amplifying biases, errors, or unintended patterns over successive iterations.
- Fine-Tuning — The process of further training a pre-trained machine learning model on a smaller, task-specific or domain-specific dataset to adapt its behaviour, improve its performance on particular tasks, or align it with specific requirements. Fine-tuning modifies the model's weights rather than relying solely on prompt engineering.
- Flash Crash — An extremely rapid and severe drop in asset prices — typically followed by a quick recovery — caused by the interaction of automated trading systems, algorithmic strategies, or AI-driven market participants that amplify market volatility through cascading automated responses faster than human intervention can arrest.
- Foundation Model — A large-scale AI model trained on broad data that can be adapted to a wide range of downstream tasks through fine-tuning or prompting.
- Function Calling — A capability of large language models that allows them to generate structured output requesting the invocation of external functions or tools, with specified parameters, rather than producing only natural language text. Function calling is the mechanism through which LLMs interact with APIs, databases, code interpreters, and other external systems.
G
- GDPR — The EU's General Data Protection Regulation establishing comprehensive rules for personal data processing and storage.
- Generative Adversarial Network — A class of machine learning architecture consisting of two neural networks — a generator and a discriminator — trained in opposition, where the generator learns to produce synthetic data and the discriminator learns to distinguish synthetic from real data. GANs are a foundational technology behind deepfakes and other synthetic media.
- Goal Drift — The gradual divergence of an AI agent's effective objectives from its originally specified goals during extended autonomous operation, resulting in behavior that no longer aligns with its operators' intentions.
- Goodhart's Law — The principle that when a measure becomes a target, it ceases to be a good measure — applied to AI systems, it explains why agents that optimize a proxy metric often fail to achieve the intended objective.
- Governance — The frameworks, policies, and institutions through which AI systems are regulated, overseen, and held accountable across their lifecycle from development through deployment and retirement.
- Grandparent Scam — A social engineering fraud using AI voice cloning to impersonate a grandchild and convince older adults to send money.
- Guardrail — A safety mechanism — implemented through training constraints, input/output filters, or system-level rules — that restricts an AI system's behavior to prevent harmful, policy-violating, or unintended outputs.
H
- Hallucination — The generation of confident but factually incorrect or fabricated output by a language model, including invented citations.
- Human Agency — The capacity of individuals to make autonomous, informed decisions and exercise meaningful control over actions that affect their lives, increasingly at risk as AI systems assume decision-making authority.
- Human-in-the-Loop — A design principle requiring meaningful human oversight and intervention at critical decision points in AI-driven processes.
I
- Indirect Prompt Injection — A class of prompt injection attack where malicious instructions are embedded in external data sources — such as web pages, documents, emails, or database records — that an AI system retrieves and processes, causing the model to execute the attacker's instructions without the user's knowledge or intent.
- Information Ecosystem — The interconnected network of media, platforms, institutions, and individuals through which information is created, distributed, consumed, and verified within a society.
- Information Integrity — The trustworthiness, accuracy, and reliability of information within digital systems and public discourse, encompassing both the factual correctness of content and the authenticity of its provenance.
- Infrastructure Dependency — Critical reliance of essential services on shared AI systems, creating vulnerability to widespread failure if those systems malfunction, degrade, or become unavailable.
- Input Validation — The process of verifying that data received by an AI system conforms to expected formats, constraints, and safety requirements before it is processed. In AI contexts, input validation extends beyond traditional type-checking to include prompt filtering, injection detection, content policy enforcement, and semantic boundary verification.
- Institutional Trust — Public confidence in the reliability, competence, and good faith of societal institutions including government, media, scientific bodies, and the judiciary, which AI-enabled threats can systematically erode.
- Instruction Hierarchy — A security mechanism for large language models that establishes a priority ordering among different instruction sources — typically system prompt (highest priority), user messages (medium), and retrieved content or tool outputs (lowest) — to prevent lower-priority instructions from overriding higher-priority ones.
- Instrumental Convergence — The hypothesis that sufficiently advanced AI systems pursuing a wide range of final goals would converge on acquiring certain instrumental sub-goals — including self-preservation, resource acquisition, and goal stability — because these sub-goals are useful for achieving almost any terminal objective.
- International Humanitarian Law — The body of international law governing armed conflict, including rules on distinction, proportionality, and precaution, whose application to AI-enabled weapons systems raises fundamental questions of compliance and accountability.
J
- Jailbreak Attack — A technique that circumvents an AI model's built-in safety alignment and content policies to elicit restricted or harmful outputs.
- Job Displacement — The elimination, significant degradation, or structural transformation of human employment as AI-driven automation replaces tasks, roles, or entire occupational categories previously performed by workers.
L
- Large Language Model — A neural network trained on massive text datasets to generate, summarise, and reason about natural language.
- Least Privilege — A security principle requiring that any entity — user, process, or AI agent — is granted only the minimum permissions necessary to perform its intended function and no more. Applied to AI systems, least privilege constrains model access to tools, data, APIs, and system resources to reduce the blast radius of compromise or misuse.
- Lethal Autonomous Weapon — A weapon system that can select and engage targets without meaningful human control over the critical functions of target identification, tracking, and attack execution.
- Lethal Autonomous Weapon Systems (LAWS) — Weapons systems that can independently select and engage targets without meaningful human control over individual attack decisions, raising fundamental legal, ethical, and security concerns.
- Liar's Dividend — The phenomenon where the mere existence of deepfakes and AI-generated media allows individuals to dismiss authentic evidence — including genuine photographs, videos, and audio recordings — as potentially fabricated. The liar's dividend erodes the evidentiary value of all digital media, benefiting those who wish to deny documented events.
M
- Malware — Malicious software designed to infiltrate, damage, or gain unauthorized access to computer systems. In the context of AI threats, malware increasingly leverages machine learning to evade detection, adapt to defenses, and automate attack strategies.
- Manipulative Design — Interface patterns that exploit cognitive biases and AI personalisation to steer user behaviour against their interests, undermining informed consent and autonomous decision-making.
- Market Manipulation — The use of AI systems to artificially influence the price, volume, or conditions of financial markets through algorithmic trading strategies, coordinated information campaigns, or exploitation of market microstructure vulnerabilities.
- Market Power — The ability of dominant AI firms to control market conditions, pricing, and access to essential AI infrastructure and data, concentrating economic influence in ways that limit competition and innovation.
- Mass Surveillance — Broad, indiscriminate monitoring of populations using AI technologies such as facial recognition and communications interception.
- Media Manipulation — The deliberate alteration or fabrication of media content using AI to deceive, mislead, or influence public perception, encompassing deepfakes, synthetic text, and manipulated imagery.
- Membership Inference — An attack technique that determines whether a specific data record was included in an AI model's training dataset, potentially revealing sensitive information about individuals whose data was used.
- Memory Poisoning — The deliberate corruption of an AI agent's persistent memory, context window, or stored state to manipulate its future decisions, outputs, or behavior without the agent or its operators detecting the alteration.
- Misalignment — A condition in which an AI system's operational behaviour diverges from the objectives, values, or intentions specified by its designers, potentially causing unintended harm at varying scales.
- Misinformation — False or inaccurate information spread without deliberate intent to deceive, distinct from disinformation which involves intentional deception. AI-generated hallucinations represent a major and growing source.
- MITRE ATLAS — The Adversarial Threat Landscape for AI Systems (ATLAS) is a knowledge base maintained by MITRE Corporation that catalogues adversarial tactics, techniques, and procedures (TTPs) targeting machine learning systems. Modelled on the MITRE ATT&CK framework for cybersecurity, ATLAS provides a structured taxonomy of AI-specific attacks with documented case studies.
- Model Context Protocol — An open protocol, developed by Anthropic, that standardises how AI applications connect to external data sources and tools. MCP provides a universal interface for language models to access databases, APIs, file systems, and other services through a client-server architecture, replacing fragmented custom integrations.
- Model Inversion — An attack technique that reconstructs private or sensitive information from a machine learning model's training data by systematically analyzing the model's outputs, predictions, or confidence scores.
- Model Provenance — The documented chain of custody for an AI model — tracing its origin, training data, fine-tuning history, and distribution path to verify integrity and authenticity.
- Multi-Agent System — A computational architecture in which multiple autonomous AI agents interact, cooperate, or compete to accomplish tasks. These systems introduce emergent risks from coordination failures, conflicting objectives, and cascading errors between agents.
N
O
- Output Sandboxing — A security control that constrains and validates the outputs of an AI system before they are executed, displayed, or passed to downstream systems. Output sandboxing prevents AI-generated content — including code, tool calls, and formatted text — from causing unintended effects outside a controlled environment.
- Overreliance — Excessive dependence on AI system outputs without adequate independent verification or critical evaluation, leading to unchecked errors and diminished human judgment capacity.
- OWASP Top 10 for LLM Applications — A security awareness document published by the Open Web Application Security Project (OWASP) that identifies the ten most critical security vulnerabilities specific to applications built on large language models. The list provides standardised vulnerability descriptions, risk ratings, and mitigation guidance for LLM-integrated systems.
P
- Persistent Memory — The capacity of AI agents to retain and recall information across interactions, enabling continuity of context but creating new attack surfaces for data poisoning and unauthorized knowledge accumulation.
- Persuasive Technology — Systems designed to change user attitudes or behaviours through AI-powered personalisation, nudging, and emotional targeting, raising concerns about autonomy and informed consent.
- Phishing — A social engineering attack using fraudulent messages to trick recipients into revealing credentials, installing malware, or transferring funds.
- Polymorphic Malware — Malicious software that uses AI to continuously alter its code signature while maintaining functionality, evading detection by signature-based and AI-powered security systems.
- Price Fixing — AI-facilitated coordination of pricing among competitors, whether through explicit collusion or emergent algorithmic convergence that produces cartel-like outcomes without direct human agreement.
- Privilege Escalation — The exploitation of a system vulnerability or misconfiguration to gain elevated access rights beyond those originally authorized. In AI contexts, this includes AI agents acquiring capabilities or permissions that exceed their intended operational boundaries.
- Profiling — The automated processing of personal data to evaluate, categorise, or predict individual characteristics and behaviour, enabling targeted decisions that may affect rights and opportunities.
- Prompt Injection — An attack that inserts adversarial instructions into an AI model's input to override its intended behaviour, bypass safety constraints, or extract restricted information.
- Propaganda — Deliberately crafted messaging designed to influence public opinion, now amplified by AI-generated content and automated distribution at unprecedented speed and scale.
- Protected Characteristics — Legally defined attributes such as race, gender, age, disability, and religion that anti-discrimination law prohibits as bases for adverse treatment in decisions affecting individuals.
- Proxy Discrimination — A form of algorithmic discrimination where AI systems use ostensibly neutral variables that correlate with protected characteristics, producing biased outcomes without explicitly referencing protected attributes.
- Proxy Variable — A data attribute that correlates with a protected characteristic, enabling indirect algorithmic discrimination even when the protected attribute is excluded.
- Pseudonymization — Replacing direct identifiers in datasets with artificial identifiers while maintaining data utility, a privacy-enhancing technique required by GDPR but vulnerable to AI-powered re-identification.
R
- Re-Identification — The process of linking supposedly anonymised or de-identified data back to specific individuals, a capability dramatically enhanced by AI techniques that can cross-reference diverse data sources.
- Recommendation System — AI systems that suggest content, products, or actions to users based on predicted preferences, shaping information exposure and individual choices at scale.
- Recursive Self-Improvement — A theoretical AI capability in which a system iteratively enhances its own architecture or reasoning, potentially leading to rapid capability gains.
- Red Teaming — Structured adversarial testing of AI systems to identify vulnerabilities, safety failures, and harmful capabilities before deployment.
- Remote Code Execution — A class of security vulnerability that allows an attacker to run arbitrary code on a target system from a remote location. In AI contexts, remote code execution risks arise when language models with code execution capabilities are manipulated through prompt injection or tool misuse to execute attacker-controlled commands.
- Representation Gap — Significant disparities between groups in training data coverage, leading to AI systems that perform poorly or produce biased outcomes for underrepresented populations.
- Representational Harm — Harm that occurs when AI systems reinforce stereotypes, erase identities, or demean social groups through biased outputs, even in the absence of direct material consequences.
- Retrieval-Augmented Generation (RAG) — An architecture that enhances language model responses by retrieving relevant documents from external knowledge bases and including them in the model's context window alongside the user's query.
- Reward Hacking — When an AI agent finds unintended ways to maximise its reward signal that satisfy the formal objective but violate the designer's actual intent, exploiting gaps between specified and intended goals.
- RLHF (Reinforcement Learning from Human Feedback) — A training technique that aligns language model behavior with human preferences by using human evaluators to rank model outputs, then training the model to prefer higher-ranked responses.
- Robocall — An automated telephone call delivering a pre-recorded or AI-synthesised message, increasingly used in fraud, scams, and disinformation campaigns.
- Robustness — The ability of an AI system to maintain correct and reliable performance when faced with adversarial inputs, distribution shifts, or unexpected operating conditions.
S
- Safety-Critical — Systems where AI failure could result in death, serious injury, or significant environmental damage, requiring the highest standards of testing, oversight, and human control.
- Self-Determination — The right and capacity of individuals to make meaningful choices about their own lives without undue influence or constraint from automated systems.
- Self-Replication — The ability of an AI system to autonomously create copies of itself, including its model weights, code, or operational configuration, on new compute infrastructure without explicit human authorisation. Self-replication is an emergent capability concern for advanced AI systems, particularly agentic systems with access to code execution and network resources.
- Sensitive Data — Personal information revealing racial origin, political opinions, health status, sexual orientation, or other characteristics that require heightened protection under data protection law.
- Single Point of Failure — A component whose failure causes an entire system to stop functioning, particularly concerning when AI systems or their underlying infrastructure become critical dependencies without adequate redundancy.
- Smishing — A phishing attack conducted via SMS text messages, often using AI to generate convincing, contextually relevant lures.
- Social Engineering — Psychological manipulation techniques that exploit human trust, authority, and urgency to trick individuals into revealing credentials, authorizing transactions, or granting system access.
- Social Scoring — AI systems that assign scores to individuals based on behaviour, social connections, or personal characteristics, used to determine access to services, opportunities, or freedoms.
- Specification Gaming — A failure mode in which an AI system finds an unintended way to achieve high scores on its specified objective without fulfilling the designer's actual intent. The system exploits loopholes, ambiguities, or oversights in the reward function or evaluation criteria to satisfy the literal specification while violating its spirit.
- Stereotyping — AI systems reproducing or amplifying oversimplified, generalised characterisations of social groups in their outputs, reinforcing harmful preconceptions at scale.
- Superintelligence — A hypothetical AI system that surpasses human cognitive ability across virtually all domains, including reasoning, planning, and social intelligence.
- Supply Chain Attack — An attack that compromises a system by tampering with upstream components — model weights, datasets, software packages, or tool configurations — before they reach the deploying organization.
- Synthetic Identity — A fabricated identity constructed by combining real and fictitious personal information — such as genuine Social Security numbers with fake names and addresses — or by using AI-generated biometric data (face images, voice prints) to create a persona that does not correspond to any real individual but can pass identity verification systems.
- Synthetic Media — Media content — video, audio, images, or text — wholly or partially generated or manipulated by AI.
- System Prompt — A set of instructions provided to a language model by the application developer that defines the model's role, behavior constraints, and operational context — distinct from user input but processed in the same token stream.
- Systemic Risk — The risk that failure, disruption, or unintended behaviour in one component of the AI ecosystem propagates across interconnected systems and institutions, causing widespread harm that exceeds the sum of individual failures.
T
- Tracking — Continuous monitoring of individual location, activity, or digital behaviour by AI systems, often conducted without meaningful consent or awareness.
- Training Data — The datasets used to train machine learning models, whose quality and representativeness directly influence model behaviour, biases, and harms.
- Transfer Learning — A machine learning technique where a model trained on one task or dataset is adapted to perform a different but related task, leveraging the knowledge acquired during initial training. Transfer learning is the foundational principle behind fine-tuning and the use of pre-trained foundation models across diverse applications.
- Trust Erosion — The cumulative degradation of public confidence in institutions, media, information systems, and shared epistemic frameworks, accelerated by the proliferation of AI-generated synthetic content and automated manipulation.
V
- Vendor Lock-In — Dependency on a single AI provider's proprietary models, tools, or infrastructure that creates prohibitively high switching costs and reduces organisational autonomy.
- Vishing — Voice phishing -- a social engineering attack via telephone, increasingly using AI voice cloning to impersonate trusted individuals.
- Voice Cloning — AI technology that replicates a specific individual's voice to generate realistic synthetic speech.
- Vulnerability Discovery — The use of AI to automatically identify security weaknesses in software, networks, or systems, a dual-use capability that serves both defenders and attackers.