CVE Search Engine - Security Vulnerabilities and Exploits Search Tool

show all

1 matches found

Packet Storm News•added 2026/05/27 12:0 a.m.•49 views

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that...

5.7AI score

SaveExploits0

Rows per page

Query Builder

Family

Bulletin Type

Min CVSS Score

Date

Order by