56 matches found
MRMMIA: Membership Inference Attacks on Memory in Chat Agents
Membership inference attacks MIAs test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent...
Re-Triggering Safeguards within LLMs for Jailbreak Detection
This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...
Attention Is Where You Attack
Safety-aligned large language models rely on RLHF and instruction tuning to refuse harmful requests, yet the internal mechanisms implementing safety behavior remain poorly understood. We introduce the Attention Redistribution Attack ARA, a white-box adversarial attack that identifies...
Towards Optimal Agentic Architectures for Offensive Security Tasks
Agentic security systems increasingly audit live targets with tool-using LLMs, but prior systems fix a single coordination topology, leaving unclear when additional agents help and when they only add cost. We treat topology choice as an empirical systems question. We introduce a controlled...
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses under White-Box and Black-Box Threats
Concept drift and adversarial evasion are two major challenges for deploying machine learning-based malware detectors. While both have been studied separately, their combination, the adversarial robustness of drift-adaptive detectors, remains unexplored. We address this problem with AdvDA, a rece...
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
Large language models LLMs possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models LRMs, which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotte...
The Role of Learning in Attacking Intrusion Detection Systems
Recent work on network attacks have demonstrated that ML-based network intrusion detection systems NIDS can be evaded with adversarial perturbations. However, these attacks rely on complex optimizations that have large computational overheads, making them impractical in many real-world settings. ...
AEGIS: White-Box Attack Path Generation Using LLMs and Training Effectiveness Evaluation for Large-Scale Cyber Defence Exercises
Creating attack paths for cyber defence exercises requires substantial expert effort. Existing automation requires vulnerability graphs or exploit sets curated in advance, limiting where it can be applied. We present AEGIS, a system that generates attack paths using LLMs, white-box access, and...
Rectifying Adversarial Examples Using Their Vulnerabilities
Deep neural network-based classifiers are prone to errors when processing adversarial examples AEs. AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanism...
WuppieFuzz: Coverage-Guided, Stateful REST API Fuzzing
Many business processes currently depend on web services, often using REST APIs for communication. REST APIs expose web service functionality through endpoints, allowing easy client interaction over the Internet. To reduce the security risk resulting from exposed endpoints, thorough testing is...
One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs
Finetuning pretrained large language models LLMs has become the standard paradigm for developing downstream applications. However, its security implications remain unclear, particularly regarding whether finetuned LLMs inherit jailbreak vulnerabilities from their pretrained sources. We investigat...
Physical ID-Transfer Attacks against Multi-Object Tracking Via Adversarial Trajectory
Multi-Object Tracking MOT is a critical task in computer vision, with applications ranging from surveillance systems to autonomous driving. However, threats to MOT algorithms have yet been widely studied. In particular, incorrect association between the tracked objects and their assigned IDs can...
QueryIPI: Query-Agnostic Indirect Prompt Injection on Coding Agents
Modern coding agents integrated into IDEs combine powerful tools and system-level actions, exposing a high-stakes attack surface. Existing Indirect Prompt Injection IPI studies focus mainly on query-specific behaviors, leading to unstable attacks with lower success rates. We identify a more sever...
Decoding Deception: Understanding Automatic Speech Recognition Vulnerabilities in Evasion and Poisoning Attacks
Recent studies have demonstrated the vulnerability of Automatic Speech Recognition systems to adversarial examples, which can deceive these systems into misinterpreting input speech commands. While previous research has primarily focused on white-box attacks with constrained optimizations, and...
Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection
Credit card fraud detection CCFD is a critical application of Machine Learning ML in the financial sector, where accurately identifying fraudulent transactions is essential for mitigating financial losses. ML models have demonstrated their effectiveness in fraud detection task, in particular with...
Breaking the Illusion of Security Via Interpretation: Interpretable Vision Transformer Systems under Attack
Vision transformer ViT models, when coupled with interpretation models, are regarded as secure and challenging to deceive, making them well-suited for security-critical domains such as medical applications, autonomous vehicles, drones, and robotics. However, successful attacks on these systems ca...
QGuard:Question-Based Zero-Shot Guard for Multi-Modal LLM Safety
The recent advancements in Large Language ModelsLLMs have had a significant impact on a wide range of fields, from general domains to specialized areas. However, these advancements have also significantly increased the potential for malicious users to exploit harmful and jailbreak prompts for...
Pushing the Limits of Safety: a Technical Report on the ATLAS Challenge 2025
Multimodal Large Language Models MLLMs have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing...
Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation
The security of modern vehicles has become increasingly important, with the controller area network CAN bus serving as a critical communication backbone for various Electronic Control Units ECUs. The absence of robust security measures in CAN, coupled with the increasing connectivity of vehicles,...
Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples
Adversarial detection protects models from adversarial attacks by refusing suspicious test samples. However, current detection methods often suffer from weak generalization: their effectiveness tends to degrade significantly when applied to adversarially trained models rather than naturally train...