Lucene search
K

26 matches found

Packet Storm News
Packet Storm News
added 2026/05/28 12:0 a.m.2 views

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

As Large Language Models evolve for user convenience, vulnerability to jailbreak attacks continues to be reported despite ongoing efforts in safety training. Traditional jailbreak techniques typically focus on a single prompt injection, neglecting the models' ability to remember the flow of...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/21 12:0 a.m.0 views

Involuntary In-Context Learning: Exploiting Few-Shot Pattern Completion to Bypass Safety Alignment in GPT-5.4

Safety alignment in large language models relies on behavioral training that can be overridden when sufficiently strong in-context patterns compete with learned refusal behaviors. We introduce Involuntary In-Context Learning IICL, an attack class that uses abstract operator framing with few-shot...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/18 12:0 a.m.3 views

HarmChip: Evaluating Hardware Security Centric LLM Safety Via Jailbreak Benchmarking

The integration of large language models LLMs into electronic design automation EDA workflows has introduced powerful capabilities for RTL generation, verification, and design optimization, but also raises critical security concerns. Malicious LLM outputs in this domain pose hardware-level threat...

5.8AI score
Exploits0
Microsoft Secure
Microsoft Secure
added 2026/02/09 5:12 p.m.4 views

A one-prompt attack that breaks LLM safety alignment

Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...

5.7AI score
Exploits0
Microsoft Secure
Microsoft Secure
added 2026/02/09 5:12 p.m.6 views

A one-prompt attack that breaks LLM safety alignment

Large language models LLMs and diffusion models now power a wide range of applications, from document assistance to text-to-image generation, and users increasingly expect these systems to be safety-aligned by default. Yet safety alignment is only as robust as its weakest failure mode. Despite...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/31 12:0 a.m.4 views

Jailbreaking LLMs Via Calibration

Safety alignment in Large Language Models LLMs often creates a systematic discrepancy between a model's aligned output and the underlying pre-aligned data distribution. We propose a framework in which the effect of safety alignment on next-token prediction is modeled as a systematic distortion of...

5.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/02 12:0 a.m.2 views

Emoji-Based Jailbreaking of Large Language Models

Large Language Models LLMs are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt engineering. This study investigates emoji-based jailbreaking, where emoji sequences are embedded in textual prompts to trigger harmful and unethical...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/11/09 12:0 a.m.4 views

EASE: Practical and Efficient Safety Alignment for Small Language Models

Small language models SLMs are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. Current shallow alignment methods that rely on direct refusal of malicious queries fail to provide robust protection, particularly against adversarial jailbreaks. While...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/24 12:0 a.m.3 views

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Fine-tuning large language models LLMs with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific domains. Given the shared characteristics in data across different organizations, the idea of collaboratively fine-tuning an LLM using data from multiple...

6.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/04 12:0 a.m.3 views

Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs

Large language models LLMs have undergone safety alignment efforts to mitigate harmful outputs. However, as LLMs become more sophisticated in reasoning, their intelligence may introduce new security risks. While traditional jailbreak attacks relied on singlestep attacks, multi-turn jailbreak...

7.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/04 12:0 a.m.3 views

Breaking to Build: a Threat Model of Prompt-Based Attacks for Securing LLMs

The proliferation of Large Language Models LLMs has introduced critical security challenges, where adversarial actors can manipulate input prompts to cause significant harm and circumvent safety alignments. These prompt-based attacks exploit vulnerabilities in a model's design, training, and...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/08/13 12:0 a.m.4 views

Amazon Nova AI Challenge -- Trusted AI: Advancing Secure, AI-Assisted Software Development

AI systems for software development are rapidly gaining prominence, yet significant challenges remain in ensuring their safety. To address this, Amazon launched the Trusted AI track of the Amazon Nova AI Challenge, a global competition among 10 university teams to drive advances in secure AI. In...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/10 12:0 a.m.3 views

Agent Safety Alignment Via Reinforcement Learning

The emergence of autonomous Large Language Model LLM agents capable of tool usage has introduced new safety risks that go beyond traditional conversational misuse. These agents, empowered to execute external functions, are vulnerable to both user-initiated threats e.g., adversarial prompts and...

7.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/23 12:0 a.m.2 views

Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks

The widespread deployment of large language models LLMs has raised critical concerns over their vulnerability to jailbreak attacks, i.e., adversarial prompts that bypass alignment mechanisms and elicit harmful or policy-violating outputs. While proprietary models like GPT-4 have undergone extensi...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/19 12:0 a.m.2 views

Probing the Robustness of Large Language Models Safety to Latent Perturbations

Safety alignment is a key requirement for building reliable Artificial General Intelligence. Despite significant advances in safety alignment, we observe that minor latent shifts can still trigger unsafe responses in aligned models. We argue that this stems from the shallow nature of existing...

6.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/14 12:0 a.m.3 views

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

Large language models LLMs exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/07 12:0 a.m.3 views

From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment

Safely aligning large language models LLMs often demands extensive human-labeled preference data, a process that's both costly and time-consuming. While synthetic data offers a promising alternative, current methods frequently rely on complex iterative prompting or auxiliary models. To address...

7.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/03 12:0 a.m.2 views

BitBypass: a New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

The inherent risk of generating harmful and unsafe content by Large Language Models LLMs, has highlighted the need for their safety alignment. Various techniques like supervised fine-tuning, reinforcement learning from human feedback, and red-teaming were developed for ensuring the safety alignme...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/30 12:0 a.m.2 views

Safety Alignment Can Be Not Superficial with Explicit Safety Signals

Recent studies on the safety alignment of large language models LLMs have revealed that existing approaches often operate superficially, leaving models vulnerable to various adversarial attacks. Despite their significance, these studies generally fail to offer actionable solutions beyond data...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/29 12:0 a.m.3 views

SafeCOMM: What about Safety Alignment in Fine-Tuned Telecom Large Language Models?

Fine-tuning large language models LLMs for telecom tasks and datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can...

7.3AI score
Exploits0
Rows per page
Query Builder