14 matches found
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
Large language models LLMs possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models LRMs, which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotte...
SecPI: Secure Code Generation with Reasoning Models Via Security Reasoning Internalization
Reasoning language models RLMs are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code. Prior training-based approaches for secure code generation face a critical limitation that prevents their direct applicati...
ReasoningBomb: A Stealthy Denial-Of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models
Large reasoning models LRMs extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service PI-DoS attacks that exploit the high computational cost of reasoning. We first formalize inference cost...
Red Teaming Large Reasoning Models
Large Reasoning Models LRMs have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought CoT. However, these models introduce novel safety and reliability risks, such as CoT-hijacking and...
Large Reasoning Models Are Autonomous Jailbreak Agents
Jailbreaking -- bypassing built-in safety mechanisms in AI models -- has traditionally required complex technical procedures or specialized human expertise. In this study, we show that the persuasive capabilities of large reasoning models LRMs simplify and scale jailbreaking, converting it into a...
When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs
Large Language Models LLMs have become integral to automated code analysis, enabling tasks such as vulnerability detection and code comprehension. However, their integration introduces novel attack surfaces. In this paper, we identify and investigate a new class of prompt-based attacks, termed...
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models
The introduction of advanced reasoning capabilities have improved the problem-solving performance of large language models, particularly on math and coding benchmarks. However, it remains unclear whether these reasoning models are more or less vulnerable to adversarial prompt attacks than their...
Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
Prior work shows that LLMs finetuned on malicious behaviors in a narrow domain e.g., writing insecure code can become broadly misaligned -- a phenomenon called emergent misalignment. We investigate whether this extends from conventional LLMs to reasoning models. We finetune reasoning models on...
From Thinking to Output: Chain-Of-Thought and Text Generation Characteristics in Reasoning Language Models
Recently, there have been notable advancements in large language models LLMs, demonstrating their growing abilities in complex reasoning. However, existing research largely overlooks a thorough and systematic comparison of these models' reasoning processes and outputs, particularly regarding thei...
Towards Understanding the Cognitive Habits of Large Reasoning Models
Large Reasoning Models LRMs, which autonomously produce a reasoning Chain of Thought CoT before producing final responses, offer a promising approach to interpreting and monitoring model behaviors. Inspired by the observation that certain CoT patterns -- e.g., "Wait, did I miss anything?'' --...
SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows
Large language models LLMs have seen widespread success in code generation tasks for different scenarios, both everyday and professional. However current LLMs, despite producing functional code, do not prioritize security and may generate code with exploitable vulnerabilities. In this work, we...
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Large Reasoning Models LRMs introduce a new generation paradigm of explicitly reasoning before answering, leading to remarkable improvements in complex tasks. However, they pose great safety risks against harmful queries and adversarial attacks. While recent mainstream safety efforts on LRMs,...
AutoRAN: Weak-To-Strong Jailbreaking of Large Reasoning Models
This paper presents AutoRAN, the first automated, weak-to-strong jailbreak attack framework targeting large reasoning models LRMs. At its core, AutoRAN leverages a weak, less-aligned reasoning model to simulate the target model's high-level reasoning structures, generates narrative prompts, and...
System Prompt Poisoning: Persistent Attacks on Large Language Models beyond User Injection
Large language models LLMs have gained widespread adoption across diverse applications due to their impressive generative capabilities. Their plug-and-play nature enables both developers and end users to interact with these models through simple prompts. However, as LLMs become more integrated in...