16 matches found
Why Aggregate Accuracy Is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems
Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demograph...
Label-Efficient Training Updates for Malware Detection over Time
Machine Learning ML-based detectors are becoming essential to counter the proliferation of malware. However, common ML algorithms are not designed to cope with the dynamic nature of real-world settings, where both legitimate and malicious software evolve. This distribution drift causes models...
A Systematic Literature Review on LLM Defenses against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
The rapid advancement and widespread adoption of generative artificial intelligence GenAI and large language models LLMs has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs...
OneShield -- the Next Generation of LLM Guardrails
The rise of Large Language Models has created a general excitement about the great potential for a myriad of applications. While LLMs offer many possibilities, questions about safety, privacy, and ethics have emerged, and all the key actors are working to address these issues with protective...
KCES: Training-Free Defense for Robust Graph Neural Networks Via Kernel Complexity
Graph Neural Networks GNNs have achieved impressive success across a wide range of graph-based tasks, yet they remain highly vulnerable to small, imperceptible perturbations and adversarial attacks. Although numerous defense methods have been proposed to address these vulnerabilities, many rely o...
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Safely aligning large language models LLMs often demands extensive human-labeled preference data, a process that's both costly and time-consuming. While synthetic data offers a promising alternative, current methods frequently rely on complex iterative prompting or auxiliary models. To address...
Explainer-Guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
Binary code similarity detection BCSD serves as a fundamental technique for various software engineering tasks, e.g., vulnerability detection and classification. Attacks against such models have therefore drawn extensive attention, aiming at misleading the models to generate erroneous predictions...
PrivATE: Differentially Private Confidence Intervals for Average Treatment Effects
The average treatment effect ATE is widely used to evaluate the effectiveness of drugs and other medical interventions. In safety-critical applications like medicine, reliable inferences about the ATE typically require valid uncertainty quantification, such as through confidence intervals CIs...
Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models
The widespread adoption of encrypted communication protocols such as HTTPS and TLS has enhanced data privacy but also rendered traditional anomaly detection techniques less effective, as they often rely on inspecting unencrypted payloads. This study aims to develop an interpretable machine...
MorphMark: Flexible Adaptive Watermarking for Large Language Models
Watermarking by altering token sampling probabilities based on red-green list is a promising method for tracing the origin of text generated by large language models LLMs. However, existing watermark methods often struggle with a fundamental dilemma: improving watermark effectiveness the...
Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning
Federated learning FL enhances privacy and reduces communication cost for resource-constrained edge clients by supporting distributed model training at the edge. However, the heterogeneous nature of such devices produces diverse, non-independent, and identically distributed non-IID data, making t...
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
Large Language Models LLMs are increasingly embedded in autonomous systems and public-facing environments, yet they remain susceptible to jailbreak vulnerabilities that may undermine their security and trustworthiness. Adversarial suffixes are considered to be the current state-of-the-art...
LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-Induced Risks and Vulnerabilities
The growing adoption of Large Language Models LLMs has influenced the development of their lighter counterparts-Small Language Models SLMs-to enable on-device deployment across smartphones and edge devices. These SLMs offer enhanced privacy, reduced latency, server-free functionality, and improve...
T2VShield: Model-Agnostic Jailbreak Defense for Text-To-Video Models
The rapid development of generative artificial intelligence has made text to video models essential for building future multimodal world simulators. However, these models remain vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of...
Spring AI Embraces OpenAI's Structured Outputs: Enhancing JSON Response Reliability
OpenAI recently introduced a powerful feature called Structured Outputs, which ensures that AI-generated responses adhere strictly to a predefined JSON schema. This feature significantly improves the reliability and usability of AI-generated content in real-world applications. Today, we're excite...
Google Introduces Project Naptime for AI-Powered Vulnerability Research
Google has developed a new framework called Project Naptime that it says enables a large language model LLM to carry out vulnerability research with an aim to improve automated discovery approaches. "The Naptime architecture is centered around the interaction between an AI agent and a target...