33 matches found
MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents Via User-Generated Content
Mobile graphical user interface GUI agents driven by vision-language models VLMs perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from user-generated content. We present MIRAGE Mobile Injection of Realistic...
STARE: Step-Wise Temporal Alignment and Red-Teaming Engine for Multi-Modal Toxicity Attack
Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semanti...
Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models
Vision-Language Models VLMs are increasingly deployed in consumer applications where users seek recommendations about products, dining, and services. We introduce Hidden Ads, a new class of backdoor attacks that exploit this recommendation-seeking behavior to inject unauthorized advertisements...
Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models
On-device Vision-Language Models VLMs promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing e.g., AnyRes introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images in...
TreeTeaming: Autonomous Red-Teaming of Vision-Language Models Via Hierarchical Strategy Exploration
The rapid advancement of Vision-Language Models VLMs has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and...
VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering
As Large Vision-Language Models LVLMs are increasingly deployed in agent-integrated workflows and other deployment-relevant settings, their robustness against semantic visual attacks remains under-evaluated -- alignment is typically tested on explicit harmful content rather than privacy-critical...
Jailbreaking LLMs and VLMs: Mechanisms, Evaluation, and Unified Defense
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models LLMs and Vision-Language Models VLMs, emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It...
UIXPOSE: Mobile Malware Detection Via Intention-Behaviour Discrepancy Analysis
We introduce UIXPOSE, a source-code-agnostic framework that operates on both compiled and open-source apps. This framework applies Intention Behaviour Alignment IBA to mobile malware analysis, aligning UI-inferred intent with runtime semantics. Previous work either infers intent statically, e.g.,...
Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
The growing misuse of Vision-Language Models VLMs has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and content moderation. However, the real-world robustness of these defenses against adversarial attacks remains underexplored. We introduce Multi-Faceted...
Jailbreaking Large Vision Language Models in Intelligent Transportation Systems
Large Vision Language Models LVLMs demonstrate strong capabilities in multimodal reasoning and many real-world applications, such as visual question answering. However, LVLMs are highly vulnerable to jailbreaking attacks. This paper systematically analyzes the vulnerabilities of LVLMs integrated ...
MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models
Recent advances in Large Visual Language Models LVLMs have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and instruction tuning. However, the security vulnerabilities of LVLMs have become increasingly concerning,...
Evaluation of Vision-LLMs in Surveillance Video
The widespread use of cameras in our society has created an overwhelming amount of video data, far exceeding the capacity for human monitoring. This presents a critical challenge for public safety and security, as the timely detection of anomalous or criminal events is crucial for effective...
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-Of-Experts LLMs
The transformer architecture has become a cornerstone of modern AI, fueling remarkable progress across applications in natural language processing, computer vision, and multimodal learning. As these models continue to scale explosively for performance, implementation efficiency remains a critical...
Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance
Targeted adversarial attacks are essential for proactively identifying security flaws in Vision-Language Models before real-world deployment. However, current methods perturb images to maximize global similarity with the target text or reference image at the encoder level, collapsing rich visual...
Proactive Disentangled Modeling of Trigger-Object Pairings for Backdoor Defense
Deep neural networks DNNs and generative AI GenAI are increasingly vulnerable to backdoor attacks, where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels. Beyond traditional single-trigger scenarios, attackers may inject multiple triggers across...
Invisible Injections: Exploiting Vision-Language Models through Steganographic Prompt Embedding
Vision-language models VLMs have revolutionized multimodal AI applications but introduce novel security vulnerabilities that remain largely unexplored. We present the first comprehensive study of steganographic prompt injection attacks against VLMs, where malicious instructions are invisibly...
In-Context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems
Recent advances in biometric systems have significantly improved the detection and prevention of fraudulent activities. However, as detection methods improve, attack techniques become increasingly sophisticated. Attacks on face recognition systems can be broadly divided into physical and digital...
On the Feasibility of Poisoning Text-To-Image AI Models Via Adversarial Mislabeling
Today's text-to-image generative models are trained on millions of images sourced from the Internet, each paired with a detailed caption produced by Vision-Language Models VLMs. This part of the training pipeline is critical for supplying the models with large volumes of high-quality image-captio...
E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification Via MLLMs
The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessmen...
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Vision-Language Models VLMs such as CLIP have demonstrated remarkable capabilities in understanding relationships between visual and textual data through joint embedding spaces. Despite their effectiveness, these models remain vulnerable to adversarial attacks, particularly in the image modality,...