Lucene search
K

101 matches found

Packet Storm News
Packet Storm News
added 4 days ago2 views

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

Diffusion large language models dLLMs generate text by iteratively denoising partially masked sequences under bidirectional context, exposing a safety surface distinct from autoregressive LLMs. Because mask tokens are native inputs and tokens are committed by confidence rather than position,...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/27 12:0 a.m.6 views

Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking

Jailbreak attacks on large language models LLMs aim to induce LLMs to produce content that they are expected to refuse. Automated black-box jailbreak generation is especially important for safety evaluation, where the attacker observes only model outputs and needs to automatically search for...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/26 12:0 a.m.5 views

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Large Language Models LLMs are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms, resulting in harmful or inappropriate outputs. Such attacks, including jailbreaking and prompt injection, pose significant risks to the integrity and...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.5 views

Guaranteed Jailbreaking Defense Via Disrupt-And-Rectify Smoothing

This paper proposes a guaranteed defense method for large language models LLMs to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel smoothing-based defense method, termed Disrupt-and-Rectify...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.4 views

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/11 12:0 a.m.4 views

Re-Triggering Safeguards within LLMs for Jailbreak Detection

This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/08 12:0 a.m.4 views

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/27 12:0 a.m.1 views

Jailbreaking Frontier Foundation Models through Intention Deception

Large vision-language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between safe and unsafe, based on the user's intent. It has been found that this binary training regime ofte...

5.3AI score
Exploits0
Schneier on Security
Schneier on Security
added 2026/03/10 9:50 a.m.6 views

Jailbreaking the F-35 Fighter Jet

Countries around the world are becoming increasingly concerned about their dependencies on the US. If you've purchase US-made F-35 fighter jets, you are dependent on the US for software maintenance. The Dutch Defense Secretary recently said that he could jailbreak the planes to accept third-party...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/06 12:0 a.m.1 views

Two Frames Matter: A Temporal Attack for Text-To-Video Model Jailbreaking

Recent text-to-video T2V models can synthesize complex videos from lightweight natural language prompts, raising urgent concerns about safety alignment in the event of misuse in the real world. Prior jailbreak attacks typically rewrite unsafe prompts into paraphrases that evade content filters...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/27 12:0 a.m.2 views

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Jailbreak techniques for large language models LLMs evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY JBF, a system that addresses this gap via a...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/18 12:0 a.m.7 views

Recursive Language Models for Jailbreak Detection: A Procedural Defense for Tool-Augmented Agents

Jailbreak prompts are a practical and evolving threat to large language models LLMs, particularly in agentic systems that execute tools over untrusted content. Many attacks exploit long-context hiding, semantic camouflage, and lightweight obfuscations that can evade single-pass guardrails. We...

5.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/11 12:0 a.m.2 views

Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models

Jailbreaking large language models LLMs has emerged as a critical security challenge with the widespread deployment of conversational AI systems. Adversarial users exploit these models through carefully crafted prompts to elicit restricted or unsafe outputs, a phenomenon commonly referred to as...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/06 12:0 a.m.4 views

TrapSuffix: Proactive Defense against Adversarial Suffixes in Jailbreaking

Suffix-based jailbreak attacks append an adversarial suffix, i.e., a short token sequence, to steer aligned LLMs into unsafe outputs. Since suffixes are free-form text, they admit endlessly many surface forms, making jailbreak mitigation difficult. Most existing defenses depend on passive detecti...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/31 12:0 a.m.4 views

Jailbreaking LLMs Via Calibration

Safety alignment in Large Language Models LLMs often creates a systematic discrepancy between a model's aligned output and the underlying pre-aligned data distribution. We propose a framework in which the effect of safety alignment on next-token prediction is modeled as a systematic distortion of...

5.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/15 12:0 a.m.0 views

AJAR: Adaptive Jailbreak Architecture for Red-Teaming

As Large Language Models LLMs evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action security. However, existing red-teaming frameworks remain bifurcated: they either focus on rigid, script-based text...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/09 12:0 a.m.3 views

The Echo Chamber Multi-Turn LLM Jailbreak

The availability of Large Language Models LLMs has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security challenges need to be addressed to prevent financial loss and reputational damage. A key security challenge is...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/08 12:0 a.m.2 views

Multi-Turn Jailbreaking Attack in Multi-Modal Large Language Models

In recent years, the security vulnerabilities of Multi-modal Large Language Models MLLMs have become a serious concern in the Generative Artificial Intelligence GenAI research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptib...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/30 12:0 a.m.3 views

Jailbreaking Attacks Vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?

As large language models LLMs are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model alignment to trigger harmful outputs, present significant risks, with existing studies reporting high success rates in evading common LLMs. However,...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/16 12:0 a.m.3 views

Trust in LLM-Controlled Robotics: A Survey of Security Threats, Defenses and Challenges

The integration of Large Language Models LLMs into robotics has revolutionized their ability to interpret complex human commands and execute sophisticated tasks. However, such paradigm shift introduces critical security vulnerabilities stemming from the ''embodiment gap'', a discord between the...

7.4AI score
Exploits0
Rows per page
Query Builder