11 matches found
Information Theoretic Adversarial Training of Large Language Models
Large language models LLMs remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to...
Rectifying Adversarial Examples Using Their Vulnerabilities
Deep neural network-based classifiers are prone to errors when processing adversarial examples AEs. AEs are minimally perturbed input data undetectable to humans posing significant risks to security-dependent applications. Hence, extensive research has been undertaken to develop defense mechanism...
Certified but Fooled! Breaking Certified Defences with Ghost Certificates
Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process t...
Evasive Ransomware Attacks Using Low-Level Behavioral Adversarial Examples
Protecting state-of-the-art AI-based cybersecurity defense systems from cyber attacks is crucial. Attackers create adversarial examples by adding small changes i.e., perturbations to the attack features to evade or fool the deep learning model. This paper introduces the concept of low-level...
Amplifying Machine Learning Attacks through Strategic Compositions
Machine learning ML models are proving to be vulnerable to a variety of attacks that allow the adversary to learn sensitive information, cause mispredictions, and more. While these attacks have been extensively studied, current research predominantly focuses on analyzing each attack type...
Towards Model Resistant to Transferable Adversarial Examples Via Trigger Activation
Whitepaper called Towards Model Resistant To Transferable Adversarial Examples Via Trigger Activation...
Q-FAKER: Query-Free Hard Black-Box Attack Via Controlled Generation
Many adversarial attack approaches are proposed to verify the vulnerability of language models. However, they require numerous queries and the information on the target model. Even black-box attack methods also require the target model's output information. They are not applicable in real-world...
Undetectable Backdoors in Machine-Learning Models
New paper: "Planting Undetectable Backdoors in Machine Learning Models": Abstract: Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectab...
Availability Attacks against Neural Networks
New research on using specially crafted inputs to slow down machine-learning neural network systems: Sponge Examples: Energy-Latency Attacks on Neural Networks shows how to find adversarial examples that cause a DNN to burn more energy, take more time, or both. They affect a wide range of DNN...
Blind Spots in AI Just Might Help Protect Your Privacy
Researchers have found a potential silver lining in so-called adversarial examples, using it to shield sensitive data from snoops...
Confusing Self-Driving Cars by Altering Road Signs
Researchers found that they could confuse the road sign detection algorithms of self-driving cars by adding stickers to the signs on the road. They could, for example, cause a car to think that a stop sign is a 45 mph speed limit sign. The changes are subtle, though -- look at the photo from the...