Lucene search
K

4 matches found

Packet Storm News
Packet Storm News
added 2026/02/02 12:0 a.m.4 views

The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers

Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language models. Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning dat...

5.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/19 12:0 a.m.5 views

Probe Before You Talk: Towards Black-Box Defense against Backdoor Unalignment for Large Language Models

Backdoor unalignment attacks against Large Language Models LLMs enable the stealthy compromise of safety alignment using a hidden trigger while evading normal safety auditing. These attacks pose significant threats to the applications of LLMs in the real-world Large Language Model as a Service...

7.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/26 12:0 a.m.4 views

MixBridge: Heterogeneous Image-To-Image Backdoor Attack through Mixture of Schrödinger Bridges

This paper focuses on implanting multiple heterogeneous backdoor triggers in bridge-based diffusion models designed for complex and arbitrary input distributions. Existing backdoor formulations mainly address single-attack scenarios and are limited to Gaussian noise input models. To fill this gap...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/25 12:0 a.m.8 views

RADEP: a Resilient Adaptive Defense Framework against Model Extraction Attacks

Machine Learning as a Service MLaaS enables users to leverage powerful machine learning models through cloud-based APIs, offering scalability and ease of deployment. However, these services are vulnerable to model extraction attacks, where adversaries repeatedly query the application programming...

6.7AI score
Exploits0
Rows per page
Query Builder