Lucene search
K

12 matches found

Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.4 views

Guaranteed Jailbreaking Defense Via Disrupt-And-Rectify Smoothing

This paper proposes a guaranteed defense method for large language models LLMs to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel smoothing-based defense method, termed Disrupt-and-Rectify...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.2 views

Re-Triggering Safeguards within LLMs for Jailbreak Detection

This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...

5.8AI score
Exploits0
The Hacker News
The Hacker News
added 2026/01/09 11:9 a.m.11 views

Cybersecurity Predictions 2026: The Hype We Can Ignore (And the Risks We Can't)

As organizations plan for 2026, cybersecurity predictions are everywhere. Yet many strategies are still shaped by headlines and speculation rather than evidence. The real challenge isn't a lack of forecasts—it's identifying which predictions reflect real, emerging risks and which can safely be...

6.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/28 12:0 a.m.2 views

Secure Retrieval-Augmented Generation against Poisoning Attacks

Large language models LLMs have transformed natural language processing NLP, enabling applications from content generation to decision support. Retrieval-Augmented Generation RAG improves LLMs by incorporating external knowledge but also introduces security risks, particularly from data poisoning...

6.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/03 12:0 a.m.4 views

NEXUS: Network Exploration for EXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

Large Language Models LLMs have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn jailbreaks that distribute malicious intent across benign exchanges and bypass alignment mechanisms. Existing approaches often explore the adversarial space...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/25 12:0 a.m.6 views

RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks

Large Language Models LLMs watermarking has shown promise in detecting AI-generated content and mitigating misuse, with prior work claiming robustness against paraphrasing and text editing. In this paper, we argue that existing evaluations are not sufficiently adversarial, obscuring critical...

6.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/08/04 12:0 a.m.3 views

Coward: toward Practical Proactive Federated Backdoor Defense Via Collision-Based Watermark

Backdoor detection is currently the mainstream defense against backdoor attacks in federated learning FL, where malicious clients upload poisoned updates that compromise the global model and undermine the reliability of FL deployments. Existing backdoor detection techniques fall into two...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/08 12:0 a.m.2 views

TuneShield: Mitigating Toxicity in Conversational AI While Fine-Tuning on Untrusted Data

Recent advances in foundation models, such as LLMs, have revolutionized conversational AI. Chatbots are increasingly being developed by customizing LLMs on specific conversational datasets. However, mitigating toxicity during this customization, especially when dealing with untrusted training dat...

7.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/16 12:0 a.m.3 views

Mitigating Data Poisoning Attacks to Local Differential Privacy

The distributed nature of local differential privacy LDP invites data poisoning attacks and poses unforeseen threats to the underlying LDP-supported applications. In this paper, we propose a comprehensive mitigation framework for popular frequency estimation, which contains a suite of novel...

6.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/23 12:0 a.m.3 views

A Critical Evaluation of Defenses against Prompt Injection Attacks

Large Language Models LLMs are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that existing studies lack a principled approach to evaluating these defenses. In this paper, we argue...

7.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/15 12:0 a.m.2 views

DataSentinel: a Game-Theoretic Detection of Prompt Injection Attacks

LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/09 12:0 a.m.3 views

NCorr-FP: a Neighbourhood-Based Correlation-Preserving Fingerprinting Scheme for Intellectual Property Protection of Structured Data

Ensuring data ownership and traceability of unauthorised redistribution are central to safeguarding intellectual property in shared data environments. Data fingerprinting addresses these challenges by embedding recipient-specific marks into the data, typically via content modifications. We propos...

6.7AI score
Exploits0
Rows per page
Query Builder