Lucene search
K

5 matches found

Packet Storm News
Packet Storm News
added 2026/01/07 12:0 a.m.7 views

Jailbreaking LLMs and VLMs: Mechanisms, Evaluation, and Unified Defense

This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models LLMs and Vision-Language Models VLMs, emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/14 12:0 a.m.3 views

ARMOR: Aligning Secure and Safe Large Language Models Via Meticulous Reasoning

Large Language Models LLMs have demonstrated remarkable generative capabilities. However, their susceptibility to misuse has raised significant safety concerns. While post-training safety alignment methods have been widely adopted, LLMs remain vulnerable to malicious instructions that can bypass...

7.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/10 12:0 a.m.7 views

Agent Safety Alignment Via Reinforcement Learning

The emergence of autonomous Large Language Model LLM agents capable of tool usage has introduced new safety risks that go beyond traditional conversational misuse. These agents, empowered to execute external functions, are vulnerable to both user-initiated threats e.g., adversarial prompts and...

7.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/28 12:0 a.m.6 views

Jailbreak Distillation: Renewable Safety Benchmarking

Large language models LLMs are rapidly deployed in critical applications, raising urgent needs for robust safety benchmarking. We propose Jailbreak Distillation JBDistill, a novel benchmark construction framework that "distills" jailbreak attacks into high-quality and easily-updatable safety...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/19 12:0 a.m.3 views

Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction

LLM-based autonomous agents possess capabilities such as reasoning, tool invocation, and environment interaction, enabling the execution of complex multi-step tasks. The internal reasoning process, i.e., thought, of behavioral trajectory significantly influences tool usage and subsequent actions...

7.5AI score
Exploits0
Rows per page
Query Builder