CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

3 matches found

Packet Storm News•added 2026/05/27 12:0 a.m.•49 views

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that...

5.7AI score

SaveExploits0

Packet Storm News•added 2026/03/15 12:0 a.m.•6 views

Activation Surgery: Jailbreaking White-Box LLMs without Touching the Prompt

Most jailbreak techniques for Large Language Models LLMs primarily rely on prompt modifications, including paraphrasing, obfuscation, or conversational strategies. Meanwhile, abliteration techniques also known as targeted ablations of internal components have been used to study and explain LLM...

5.9AI score

SaveExploits0

Packet Storm News•added 2025/06/07 12:0 a.m.•9 views

From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment

Safely aligning large language models LLMs often demands extensive human-labeled preference data, a process that's both costly and time-consuming. While synthetic data offers a promising alternative, current methods frequently rely on complex iterative prompting or auxiliary models. To address...

7.5AI score

SaveExploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by