8 matches found
Backdooring Masked Diffusion Language Models
Masked diffusion language models MDLMs are emerging as a compelling new paradigm for text generation, but their training-time security remains largely unexplored. Existing backdoor attacks on Gaussian diffusion models or autoregressive language models do not directly apply to MDLMs because MDLMs...
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
Jailbreak techniques for large language models LLMs evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, and judging protocols. We introduce JAILBREAK FOUNDRY JBF, a system that addresses this gap via a...
Key Insights on SHADOW-AETHER-015 and Earth Preta from the 2025 MITRE ATT&CK Evaluation with Trend Vision One™
This blog discusses notable modern TTPs observed from SHADOW-AETHER-015 and Earth Preta, from Trend Research™ monitoring and Trend Vision One™ intelligence. These findings support the performance of TrendAI™ in the 2025 MITRE ATT&CK Evaluations...
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Large Language Models LLMs remain vulnerable to jailbreaking attacks where adversarial prompts elicit harmful outputs, yet most evaluations focus on single-turn interactions while real-world attacks unfold through adaptive multi-turn conversations. We present AutoAdv, a training-free framework fo...
CHAI: Command Hijacking against Embodied AI
Embodied Artificial Intelligence AI promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning grounded in perception and action to generalize beyond training distributions and adapt to novel real-world situations. These capabilities, however, al...
Mitigating Jailbreaks with Intent-Aware LLMs
Despite extensive safety-tuning, large language models LLMs remain vulnerable to jailbreak attacks via adversarially crafted instructions, reflecting a persistent trade-off between safety and task performance. In this work, we propose Intent-FT, a simple and lightweight fine-tuning approach that...
LLM Agents Should Employ Security Principles
Large Language Model LLM agents show considerable promise for automating complex tasks using contextual reasoning; however, interactions involving multiple agents and the system's susceptibility to prompt injection and other forms of context manipulation introduce new vulnerabilities related to...
Evaluating Query Efficiency and Accuracy of Transfer Learning-Based Model Extraction Attack in Federated Learning
Federated Learning FL is a collaborative learning framework designed to protect client data, yet it remains highly vulnerable to Intellectual Property IP threats. Model extraction ME attacks pose a significant risk to Machine Learning as a Service MLaaS platforms, enabling attackers to replicate...