Glossary
A comprehensive glossary of all things DevSecOps, AI Security, Threat Modeling, Kubernetes, and Container Security.
A
An adversarial attack is a deceptive technique that fools machine learning (ML) models using defective input. These attacks exploit vulnerabilities in ML models by intentionally manipulating input data to cause the model to make incorrect predictions or classifications.
This can lead to serious consequences, especially in critical fields like finance, healthcare, and security, where the reliability of AI-driven systems is paramount for safety and operational integrity.
Adversarial Machine Learning (AML) represents a critical field at the intersection of artificial intelligence security and machine learning robustness. As AI systems become increasingly integrated into safety-critical applications, from autonomous vehicles to healthcare diagnostics, understanding and defending against adversarial attacks has become essential. AML studies both the vulnerabilities of machine learning models to malicious manipulation and the defensive strategies needed to build trustworthy, resilient AI systems.
B
Backdoor attacks represent one of the most insidious threats in artificial intelligence security, where attackers subtly manipulate machine learning models during training to embed hidden vulnerabilities. Unlike traditional cyberattacks, these covert threats remain dormant until triggered by specific input patterns, making them exceptionally difficult to detect. As AI systems become increasingly integrated into critical sectors like healthcare, finance, and autonomous systems, understanding backdoor attacks has become essential for maintaining the integrity and trustworthiness of AI-driven technologies.
Bias amplification occurs when machine learning models exaggerate existing biases present in training data, leading to more skewed predictions than the original dataset. This phenomenon poses significant risks in AI systems, particularly in high-stakes applications like hiring, lending, and criminal justice. Understanding bias amplification is crucial for developing fair and equitable AI systems that serve all populations without perpetuating or worsening societal inequalities.
C
The Carlini & Wagner (C&W) Attack represents a breakthrough in adversarial machine learning, offering a sophisticated method to generate imperceptible perturbations that fool neural networks. Unlike simpler attacks like FGSM, C&W employs optimization-based techniques to create minimal, highly effective adversarial examples. This advanced attack method has become a critical benchmark for evaluating AI model robustness and security in production environments.
Catastrophic forgetting in machine learning occurs when neural networks abruptly lose previously learned knowledge while adapting to new tasks, hindering lifelong learning in AI systems. This phenomenon, also called catastrophic interference, challenges LLMs and deep learning models during continual fine-tuning. Discover causes, impacts, and proven mitigation strategies like EWC and rehearsal methods to build robust, adaptive AI.
D
Data poisoning is a critical cybersecurity threat targeting AI and machine learning models by corrupting training datasets. Attackers inject malicious, biased, or altered data to manipulate outputs, introduce backdoors, or degrade performance. As AI adoption surges in 2026, understanding data poisoning; its types, impacts, and defenses; is essential for securing LLMs, gen AI systems, and enterprise applications against these stealthy attacks.
Deepfake detection is the AI-powered process of identifying manipulated media like videos, images, and audio created by generative models. These synthetic fakes pose risks from fraud and misinformation to identity theft. In 2026, tools analyze facial inconsistencies, audio anomalies, and artifacts to combat evolving threats. Essential for cybersecurity, platforms, and enterprises, it ensures authenticity in calls, meetings, and content amid rising deepfake scams.
E
Emergent capabilities in AI refer to unexpected abilities that suddenly appear in large-scale models, particularly large language models (LLMs), as they grow in size, compute, and training data. Unlike predictable gradual improvements in metrics like next-word prediction, these capabilities; such as multi-step arithmetic, question-answering, or emoji-based movie guessing; emerge sharply at critical scales, often jumping from near-random performance to high accuracy. This phenomenon, first highlighted in seminal research, sparks debates on AI predictability, safety, and scaling laws, challenging assumptions about model behavior.
Evasion attacks represent a critical vulnerability in AI and machine learning systems, where adversaries craft subtle input perturbations to deceive trained models at inference time. Unlike poisoning attacks that corrupt training data, evasion targets deployed models, enabling real-world threats like bypassing malware detectors or fooling autonomous vehicle vision systems.
F
The Fast Gradient Sign Method (FGSM) is a foundational technique in adversarial machine learning, introduced by Ian Goodfellow et al. in 2014. It generates subtle perturbations to input data, such as images, that fool neural networks into misclassifying them while remaining nearly imperceptible to humans. FGSM exploits model gradients to maximize loss, highlighting vulnerabilities in deep learning systems used in computer vision, autonomous vehicles, and security. This simple yet powerful white-box attack underscores the need for robust defenses like adversarial training.
Federated Learning (FL) Security addresses vulnerabilities in distributed machine learning where models train across decentralized devices without sharing raw data, preserving privacy while enabling collaboration. However, FL faces threats like poisoning attacks, inference leaks, and Byzantine failures, compromising model integrity, availability, or confidentiality. Robust defenses combine cryptography, differential privacy, and robust aggregation to secure FL in applications like healthcare, finance, and IoT.
G
Gradient clipping is a vital technique in deep learning that prevents exploding gradients during neural network training, ensuring stable optimization and faster convergence. Commonly used in RNNs, LSTMs, and transformers, it caps gradient magnitudes to avoid numerical instability from backpropagation through time (BPTT) or deep layers. By rescaling gradients exceeding a threshold, it maintains training reliability across frameworks like PyTorch and TensorFlow.
H
In artificial intelligence, particularly large language models (LLMs) like ChatGPT, "hallucination" refers to the generation of plausible but factually incorrect, fabricated, or nonsensical information presented confidently as truth. Unlike human perceptual illusions, AI hallucinations stem from model limitations, leading to errors in reasoning, facts, or logic. This phenomenon poses risks in high-stakes applications such as medicine, law, and research, where unreliable outputs can mislead users and erode trust. Mitigation strategies are essential for safer AI deployment.
Homomorphic encryption (HE) is an advanced cryptographic technique enabling computations on encrypted data without decryption, preserving privacy during processing. Ideal for cloud computing and sensitive analytics, it ensures results match plaintext operations post-decryption. From partially homomorphic schemes supporting single operations to fully homomorphic encryption (FHE) for arbitrary computations, HE addresses data privacy regulations like GDPR while unlocking secure AI and collaboration.
I
In-context learning (ICL) attacks represent a sophisticated class of adversarial techniques targeting large language models (LLMs) by manipulating the demonstration examples provided within prompts. Unlike traditional attacks that modify model parameters, ICL attacks exploit the model's ability to learn from few-shot examples at inference time, making them particularly dangerous in production AI systems. These attacks can bypass safety alignments, extract sensitive information, and coerce models into generating harmful outputs;all without requiring access to model weights or training pipelines.
Inner alignment is a critical concept in AI safety that addresses whether an AI system's learned behavior actually pursues the objectives specified by its designers. While outer alignment focuses on correctly specifying what we want an AI to do, inner alignment ensures the model genuinely internalizes and follows those specifications across all situations. As AI systems become more autonomous and capable, inner alignment failures pose significant risks;models may appear aligned during training but pursue entirely different goals when deployed in new environments, potentially leading to catastrophic outcomes in high-stakes applications.
J
Jailbreaking in AI refers to techniques that manipulate large language models (LLMs) into bypassing their built-in safety guardrails. As organizations increasingly deploy AI systems in customer service, financial analysis, and decision-making, understanding jailbreaking vulnerabilities has become critical for AI security professionals. These attacks exploit the fundamental tension between an LLM's helpfulness and its safety constraints, potentially exposing sensitive data or generating harmful content.
K
K-Anonymity is a foundational privacy-preserving technique in AI security that protects individuals from re-identification in published datasets. First introduced by Pierangela Samarati and Latanya Sweeney in 1998, this data anonymization method ensures that personal information cannot be distinguished from at least k-1 other individuals in a dataset. As organizations increasingly leverage AI and machine learning on sensitive data, K-Anonymity serves as a critical safeguard against identity disclosure and linkage attacks.
K-Nearest Neighbors (KNN) is a fundamental supervised machine learning algorithm widely used in AI security and data analysis applications. First developed by Evelyn Fix and Joseph Hodges in 1951 and later expanded by Thomas Cover in 1967, KNN operates on a simple yet powerful principle: similar data points exist near one another. As a non-parametric, instance-based learning method, KNN makes predictions by analyzing the proximity of new data points to existing labeled examples, making it invaluable for classification, regression, and anomaly detection tasks in security contexts.
L
Large Language Model (LLM) security has become a critical concern as AI systems increasingly integrate into enterprise workflows, customer service platforms, and automated decision-making processes. As organizations rapidly adopt LLMs like ChatGPT, Claude, and Gemini, protecting these powerful AI systems from unauthorized access, manipulation, and exploitation has emerged as a top priority for cybersecurity professionals. Understanding LLM security vulnerabilities and implementing robust defense strategies is essential for any organization leveraging generative AI technology.
Large Language Model (LLM) security has become a critical concern as AI systems increasingly integrate into enterprise workflows, customer service platforms, and automated decision-making processes. As organizations rapidly adopt LLMs like ChatGPT, Claude, and Gemini, protecting these powerful AI systems from unauthorized access, manipulation, and exploitation has emerged as a top priority for cybersecurity professionals. Understanding LLM security vulnerabilities and implementing robust defense strategies is essential for any organization leveraging generative AI technology.
M
Machine unlearning is an emerging AI security technology that enables the selective removal of specific data from trained machine learning models. As privacy regulations like GDPR enforce the "right to be forgotten," organizations must ensure their AI systems can effectively forget user data upon request. This critical capability addresses growing concerns about data privacy, model integrity, and compliance in an increasingly AI-driven world.
Membership inference attacks (MIAs) represent one of the most significant privacy threats in machine learning security. These attacks enable adversaries to determine whether specific data was used to train an AI model, potentially exposing sensitive personal information. As organizations increasingly rely on machine learning systems trained on private data, understanding and defending against membership inference attacks has become essential for maintaining data privacy and regulatory compliance.
N
Neural Trojans are a critical security threat in AI systems, where malicious actors embed hidden triggers within neural networks during training. These triggers can cause the AI to behave unexpectedly or maliciously when activated, posing risks to AI reliability and safety. Understanding Neural Trojans is essential for securing AI models against covert attacks that can compromise sensitive applications.
NSFW Detection is a crucial AI technology designed to identify and filter content that is Not Safe For Work (NSFW), such as explicit, adult, or inappropriate material. It helps platforms maintain safe and compliant environments by automatically flagging or blocking harmful content. NSFW Detection plays a vital role in content moderation, protecting users and brands from exposure to offensive or illegal media.
O
The Orthogonality Thesis is a foundational concept in AI security and safety, stating that an AI’s intelligence level is independent of its goals or values. This means a highly intelligent AI can pursue any objective, regardless of whether it aligns with human ethics or safety. Understanding this thesis is crucial for developing secure AI systems that align with human values and mitigate risks from advanced AI.
Out-of-Distribution (OOD) Detection is a critical technique in AI security that identifies inputs or data points that differ significantly from the training data distribution. This capability helps AI systems recognize when they encounter unfamiliar or anomalous data, preventing erroneous predictions and enhancing model reliability. OOD detection is essential for maintaining AI safety, especially in high-stakes applications where unexpected inputs can lead to critical failures.
P
Projected Gradient Descent (PGD) is a powerful iterative optimization technique widely used in AI security to craft adversarial examples that test and improve model robustness. By repeatedly applying small perturbations within a constrained boundary, PGD exposes vulnerabilities in machine learning models, helping security teams evaluate and defend against sophisticated attacks. It remains a standard benchmark for adversarial robustness in 2025 and beyond.
Poisoning attacks are a critical threat in AI security where adversaries manipulate training data to corrupt machine learning models. By injecting malicious or misleading data during training, attackers can degrade model performance or cause it to behave unpredictably. Understanding poisoning attacks is essential for developing robust AI systems that maintain integrity and reliability in adversarial environments.
Q
Quantization security focuses on safeguarding AI models that use quantization—a technique that reduces model precision to improve efficiency. While quantization enables faster, smaller, and more power-efficient AI deployments, it can introduce vulnerabilities that attackers might exploit. Ensuring quantization security is critical to maintaining AI model robustness, accuracy, and resistance to adversarial attacks, especially as AI moves to resource-constrained devices like smartphones and edge systems.
R
Red Team vs. Blue Team is a foundational concept in cybersecurity and AI security, representing the offensive and defensive roles in security testing. The Red Team simulates real-world attacks to identify vulnerabilities, while the Blue Team defends systems by detecting and responding to threats. Together, they create a dynamic security environment that strengthens organizational defenses and ensures AI systems remain resilient against evolving cyber threats.
Retrieval-Augmented Generation (RAG) Security is a cutting-edge AI technique that enhances large language models (LLMs) by integrating real-time, authoritative external data sources into their responses. This approach improves accuracy, reduces misinformation, and strengthens trust in AI-driven security applications by grounding outputs in verified knowledge. RAG Security is vital for safeguarding AI systems in dynamic environments where up-to-date, domain-specific information is critical.
S
Safety filtering is a critical process in AI security that involves screening and blocking harmful, inappropriate, or disallowed content generated or processed by AI systems. It acts as a protective layer to ensure AI outputs and user inputs comply with ethical standards, legal regulations, and organizational policies. This filtering helps prevent misuse, protects users, and maintains trust in AI technologies.
Scalable Oversight is a strategic approach in AI security that enables effective monitoring and control of AI systems as they grow in complexity and deployment scale. It combines human expertise with automated tools to ensure AI models operate safely, ethically, and in compliance with regulations. This method addresses the challenge of maintaining robust supervision over AI behaviors without sacrificing efficiency or scalability.
T
Transfer Learning Security focuses on safeguarding machine learning models that leverage transfer learning techniques. Transfer learning allows models to apply knowledge gained from one task to improve performance on another related task. Ensuring security in this process is critical to prevent vulnerabilities such as data leakage, model poisoning, and adversarial attacks that could compromise the integrity and confidentiality of AI systems.
Trigger-based attacks are a sophisticated form of adversarial manipulation targeting AI systems, especially large language models (LLMs).
These attacks use specific trigger inputs, such as phrases, images, or patterns, that activate hidden malicious behaviors in the AI model. Understanding and defending against trigger-based attacks is critical to maintaining AI system integrity, preventing unauthorized actions, and safeguarding sensitive data in AI-driven environments.
U
Uncertainty Quantification (UQ) is a critical process in AI security that measures the confidence and reliability of AI model predictions. It helps identify where AI systems may be unsure or prone to errors, enabling safer and more trustworthy decision-making. In high-stakes environments like cybersecurity, finance, and defense, UQ ensures AI tools can signal their limits, improving risk management and operational safety.
Universal Adversarial Perturbations (UAPs) are subtle, image-agnostic modifications that can fool deep learning models across many inputs. These perturbations are nearly invisible to humans but can cause widespread misclassification in AI systems, posing serious security risks. UAPs expose vulnerabilities in AI applications such as autonomous vehicles, facial recognition, and cybersecurity, making their understanding vital for developing robust and secure AI defenses.
V
Value learning is a foundational concept in AI that focuses on teaching machines to understand and align with human values and preferences. It enables AI systems to make decisions that reflect ethical considerations, safety, and societal norms. In AI security, value learning is crucial to ensure AI behaves responsibly, mitigates risks, and avoids unintended harmful consequences while operating autonomously in complex environments.
Verification and Validation (V&V) are critical processes in AI security, ensuring that AI systems perform reliably, safely, and as intended, especially in safety-critical domains like healthcare, aerospace, and automotive. V&V involves rigorous testing, evaluation, and certification to confirm that AI models meet predefined standards and regulatory requirements. These processes help identify errors, biases, and vulnerabilities, building trust and confidence in AI-enabled systems before deployment.
W
Watermarking in AI security is a technique that embeds unique, often invisible, identifiers into AI-generated content such as images, text, or audio. This process helps authenticate the origin of the content, protect intellectual property, and detect misuse or unauthorized replication. As AI-generated media becomes widespread, watermarking plays a crucial role in ensuring transparency, combating misinformation, and maintaining trust in digital ecosystems.
A white-box attack in AI security refers to a scenario where an attacker has complete knowledge of the target machine learning model, including its architecture, parameters, and training data. This extensive access allows the attacker to craft highly effective adversarial inputs that can deceive the model into making incorrect predictions. White-box attacks are critical for testing model robustness and understanding vulnerabilities in AI systems, especially in sensitive applications like facial recognition, autonomous driving, and cybersecurity.
Z
Zero-day vulnerabilities are undisclosed security flaws in software, hardware, or firmware unknown to the vendor or developer. Because no patch or fix exists at the time of discovery, these vulnerabilities can be exploited by attackers to compromise systems before defenses are in place. Their unknown nature makes zero-day vulnerabilities particularly dangerous, posing significant risks to organizations and individuals alike.
Zero-shot attacks represent a sophisticated cybersecurity threat where malicious actions are executed without prior exposure or training data for the attack type. Leveraging zero-shot learning techniques, attackers exploit vulnerabilities that traditional detection systems fail to recognize, as these systems rely on known attack signatures. This emerging threat challenges AI-driven defenses to detect and respond to novel, unseen attack vectors in real time.
