8 matches found
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
Large language models LLMs possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models LRMs, which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotte...
ReasoningBomb: A Stealthy Denial-Of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models
Large reasoning models LRMs extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-service PI-DoS attacks that exploit the high computational cost of reasoning. We first formalize inference cost...
Red Teaming Large Reasoning Models
Large Reasoning Models LRMs have emerged as a powerful advancement in multi-step reasoning tasks, offering enhanced transparency and logical consistency through explicit chains of thought CoT. However, these models introduce novel safety and reliability risks, such as CoT-hijacking and...
Large Reasoning Models Are Autonomous Jailbreak Agents
Jailbreaking -- bypassing built-in safety mechanisms in AI models -- has traditionally required complex technical procedures or specialized human expertise. In this study, we show that the persuasive capabilities of large reasoning models LRMs simplify and scale jailbreaking, converting it into a...
From Thinking to Output: Chain-Of-Thought and Text Generation Characteristics in Reasoning Language Models
Recently, there have been notable advancements in large language models LLMs, demonstrating their growing abilities in complex reasoning. However, existing research largely overlooks a thorough and systematic comparison of these models' reasoning processes and outputs, particularly regarding thei...
Towards Understanding the Cognitive Habits of Large Reasoning Models
Large Reasoning Models LRMs, which autonomously produce a reasoning Chain of Thought CoT before producing final responses, offer a promising approach to interpreting and monitoring model behaviors. Inspired by the observation that certain CoT patterns -- e.g., "Wait, did I miss anything?'' --...
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Large Reasoning Models LRMs introduce a new generation paradigm of explicitly reasoning before answering, leading to remarkable improvements in complex tasks. However, they pose great safety risks against harmful queries and adversarial attacks. While recent mainstream safety efforts on LRMs,...
AutoRAN: Weak-To-Strong Jailbreaking of Large Reasoning Models
This paper presents AutoRAN, the first automated, weak-to-strong jailbreak attack framework targeting large reasoning models LRMs. At its core, AutoRAN leverages a weak, less-aligned reasoning model to simulate the target model's high-level reasoning structures, generates narrative prompts, and...