6 matches found
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
As large language models become increasingly integrated into daily life, audio has emerged as a key interface for human-AI interaction. However, this convenience also introduces new vulnerabilities, making audio a potential attack surface for adversaries. Our research introduces WhisperInject, a...
Private Rate-Constrained Optimization with Applications to Fair Learning
Many problems in trustworthy ML can be formulated as minimization of the model error under constraints on the prediction rates of the model for suitably-chosen marginals, including most group fairness constraints demographic parity, equality of odds, etc.. In this work, we study such constrained...
Private Statistical Estimation Via Truncation
We introduce a novel framework for differentially private DP statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded. Traditional approaches rely on problem-specific sensitivity analysis, limiting their applicability. By leveragin...
Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
As Large Language Models LLMs are widely used, understanding them systematically is key to improving their safety and realizing their full potential. Although many models are aligned using techniques such as reinforcement learning from human feedback RLHF, they are still vulnerable to jailbreakin...
Manipulating Machine-Learning Systems through the Order of the Training Data
Yet another adversarial ML attack: Most deep neural networks are trained by stochastic gradient descent. Now “stochastic” is a fancy Greek word for “random”; it means that the training data are fed into the model in random order. So what happens if the bad guys can cause the order to be not rando...
Machine learning classifiers trained via gradient descent are vulnerable to arbitrary misclassification attack
Overview Machine learning models trained using gradient descent can be forced to make arbitrary misclassifications by an attacker that can influence the items to be classified. The impact of a misclassification varies widely depending on the ML model's purpose and of what systems it is a part...