10 matches found
Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers Using TRUSTEE
Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the...
JPRO: Automated Multimodal Jailbreaking Via Multi-Agent Collaboration Framework
The widespread application of large VLMs makes ensuring their secure deployment critical. While recent studies have demonstrated jailbreak attacks on VLMs, existing approaches are limited: they require either white-box access, restricting practicality, or rely on manually crafted patterns, leadin...
SecureLearn - an Attack-Agnostic Defense for Multiclass Machine Learning against Data Poisoning Attacks
Data poisoning attacks are a potential threat to machine learning ML models, aiming to manipulate training datasets to disrupt their performance. Existing defenses are mostly designed to mitigate specific poisoning attacks or are aligned with particular ML algorithms. Furthermore, most defenses a...
Mitigating Jailbreaks with Intent-Aware LLMs
Despite extensive safety-tuning, large language models LLMs remain vulnerable to jailbreak attacks via adversarially crafted instructions, reflecting a persistent trade-off between safety and task performance. In this work, we propose Intent-FT, a simple and lightweight fine-tuning approach that...
ProvX: Generating Counterfactual-Driven Attack Explanations for Provenance-Based Detection
Provenance graph-based intrusion detection systems are deployed on hosts to defend against increasingly severe Advanced Persistent Threat. Using Graph Neural Networks to detect these threats has become a research focus and has demonstrated exceptional performance. However, the widespread adoption...
Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs
Adversarial examples have attracted significant attention over the years, yet understanding their frequency-based characteristics remains insufficient. In this paper, we investigate the intriguing properties of adversarial examples in the frequency domain for the image classification task, with t...
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
Large Language Models LLMs memorize, and thus, among huge amounts of uncontrolled data, may memorize Personally Identifiable Information PII, which should not be stored and, consequently, not leaked. In this paper, we introduce Private Memorization Editing PME, an approach for preventing private...
D2R: Dual Regularization Loss with Collaborative Adversarial Generation for Model Robustness
The robustness of Deep Neural Network models is crucial for defending models against adversarial attacks. Recent defense methods have employed collaborative learning frameworks to enhance model robustness. Two key limitations of existing methods are i insufficient guidance of the target model via...
Hijacking Large Language Models Via Adversarial In-Context Learning
In-context learning ICL has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations demos in the preconditioned prompts. Despite its promising performance, crafted adversarial attacks pose a notable threat to the robustness of...
Evaluating the Vulnerability of ML-Based Ethereum Phishing Detectors to Single-Feature Adversarial Perturbations
This paper explores the vulnerability of machine learning models to simple single-feature adversarial attacks in the context of Ethereum fraudulent transaction detection. Through comprehensive experimentation, we investigate the impact of various adversarial attack strategies on model performance...