5 matches found
Patcher: Post-Hoc Patching of Backdoored Large Language Models
Large language models remain vulnerable to jailbreak backdoor attacks, where adversaries poison safety alignment data to embed hidden triggers that bypass safety mechanisms. Existing defenses often require comprehensive attack information or multiple triggered examples, making them impractical wh...
Learning Obfuscations of LLM Embedding Sequences: Stained Glass Transform
The high cost of ownership of AI compute infrastructure and challenges of robust serving of large language models LLMs has led to a surge in managed Model-as-a-service deployments. Even when enterprises choose on-premises deployments, the compute infrastructure is typically shared across many tea...
CSVAR: Enhancing Visual Privacy in Federated Learning Via Adaptive Shuffling against Overfitting
Although federated learning preserves training data within local privacy domains, the aggregated model parameters may still reveal private characteristics. This vulnerability stems from clients' limited training data, which predisposes models to overfitting. Such overfitting enables models to...
A Critical Evaluation of Defenses against Prompt Injection Attacks
Large Language Models LLMs are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that existing studies lack a principled approach to evaluating these defenses. In this paper, we argue...
OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Large language models LLMs trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extractin...