88 matches found
A New Framework for Cybersecurity Refusals in AI Agents
Agentic scaffolds have dramatically improved LLM performance on complex, long-horizon tasks, yielding both broad benefits and amplified risks in domains like cybersecurity. Existing benchmarks for AI agents in cybersecurity focus mainly on measuring proficiency--how effectively agents can complet...
Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations
In this paper, we investigate whether refusal behavior can be predicted from LLM intermediate activations before decoding using linear probes trained on residual stream activations at each transformer block. We find that refusal is linearly decodable well before the final layer, indicating that...
Linux kernel 安全漏洞
The Linux kernel is the core of the open-source operating system Linux, developed by the Linux Foundation in the United States. There is a security vulnerability in the Linux kernel, which stems from the smcclcwaitmsg function accessing the link group state prematurely when a CLC refusal occurs...
Astra Linux - уязвимость в linux-5.15, linux-6.1
In the Linux kernel, the following vulnerability has been resolved: remoteproc: mediatek: Ensure that the IPI buffer fits within the L2TCM. The location of the IPI buffer is determined from the firmware that we load into the System Companion Processor. It is not guaranteed that both the SRAM size...
Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)
The evaluation of large language model refusal on malicious-coding tasks now spans at least thirteen publicly released prompt corpora AdvBench, the CyberSecEval family, RMCBench, RedCode, MCGMark, JailbreakBench, CySecBench, MalwareBench, CIRCLE, MOCHA, ASTRA, Scam2Prompt / Innoc2Scam-bench, and...
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications
Safety-aligned language models often refuse cybersecurity requests whose wording resembles misuse, even when the task is authorized and defensive. This makes security evaluation ambiguous: a failed answer may reflect missing capability or refusal-policy intervention. Ablating Safety studies...
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection
Representation Engineering typically relies on static refusal vectors derived from terminal representations. We move beyond this paradigm, demonstrating that refusal is a dynamic and sparse process rather than a localized outcome. Using Causal Tracing, we uncover the Refusal Trajectory-a persiste...
Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs
Large Language ModelsLLMs are widely deployed, yet are vulnerable to jailbreak prompts that elicit policy-violating outputs. Although prior studies have uncovered these risks, they typically treat all tokens as equally important during prompt mutation, overlooking the varying contributions of...
Activation Surgery: Jailbreaking White-Box LLMs without Touching the Prompt
Most jailbreak techniques for Large Language Models LLMs primarily rely on prompt modifications, including paraphrasing, obfuscation, or conversational strategies. Meanwhile, abliteration techniques also known as targeted ablations of internal components have been used to study and explain LLM...
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
As large language models LLMs become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering...
About the security content of Compressor 4.11.1
About the security content of Compressor 4.11.1 This document describes the security content of Compressor 4.11.1. About Apple security updates For our customers' protection, Apple doesn't disclose, discuss, or confirm security issues until an investigation has occurred and patches or releases ar...
SUSE CVE-2023-53695
In the Linux kernel, the following vulnerability has been resolved: udf: Detect system inodes linked into directory hierarchy When UDF filesystem is corrupted, hidden system inodes can be linked into directory hierarchy which is an avenue for further serious corruption of the filesystem and kerne...
UBUNTU-CVE-2023-53695
In the Linux kernel, the following vulnerability has been resolved: udf: Detect system inodes linked into directory hierarchy When UDF filesystem is corrupted, hidden system inodes can be linked into directory hierarchy which is an avenue for further serious corruption of the filesystem and kerne...
EUVD-2017-17963
Malware in sbrugna...
EUVD-2008-5375
Malware in sbrugna...
EUVD-2024-54426
Malicious code in bioql PyPI...
Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism Via Probabilistically Ablating Refusal Direction
Jailbreak attacks pose persistent threats to large language models LLMs. Current safety alignment methods have attempted to address these issues, but they experience two significant limitations: insufficient safety alignment depth and unrobust internal defense mechanisms. These limitations make...
ACPICA: Refuse to evaluate a method if arguments are missing
...
Linux Distros Unpatched Vulnerability : CVE-2024-27059
The Linux/Unix host has one or more packages installed that are impacted by a vulnerability without a vendor supplied patch available. - USB: usb-storage: Prevent divide-by-0 error in isd200atacommand The isd200 sub-driver in usb-storage uses the HEADS and SECTORS values in the ATA ID information...
Enhancing Jailbreak Attacks on LLMs Via Persona Prompts
Jailbreak attacks aim to exploit large language models LLMs by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on dire...