7 matches found
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...
A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities across Clinical Specialties
Medical Large Language Models LLMs are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU cluster...
TeleAI-Safety: A Comprehensive LLM Jailbreaking Benchmark Towards Attacks, Defenses, and Evaluations
While the deployment of large language models LLMs in high-value industries continues to expand, the systematic assessment of their safety against jailbreak and prompt-based attacks remains insufficient. Existing safety evaluation benchmarks and frameworks are often limited by an imbalanced...
AutoDAN-Reasoning: Enhancing Strategies Exploration Based Jailbreak Attacks with Test-Time Scaling
Recent advancements in jailbreaking large language models LLMs, such as AutoDAN-Turbo, have demonstrated the power of automated strategy discovery. AutoDAN-Turbo employs a lifelong learning agent to build a rich library of attack strategies from scratch. While highly effective, its test-time...
ReGA: Representation-Guided Abstraction for Model-Based Safeguarding of LLMs
Large Language Models LLMs have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models,...
Efficient and Stealthy Jailbreak Attacks Via Adversarial Prompt Distillation from LLMs to SLMs
Attacks on large language models LLMs in jailbreaking scenarios raise many security and ethical issues. Current jailbreak attack methods face problems such as low efficiency, high computational cost, and poor cross-model adaptability and versatility, which make it difficult to cope with the rapid...
CVE-2025-30358 Mesop Class Pollution vulnerability leads to DoS and Jailbreak attacks
Mesop is a Python-based UI framework that allows users to build web applications. A class pollution vulnerability in Mesop prior to version 0.14.1 allows attackers to overwrite global variables and class attributes in certain Mesop modules during runtime. This vulnerability could directly lead to...