220 matches found
RedEdit: Agentic Red-Teaming of Image Safety Classifiers Via MCTS-Guided Photo-Editing
Image safety classifiers serve as a critical component of contemporary content moderation systems on the internet. However, their resilience against user-style malicious image editing remains underexplored. Such behaviors are highly prevalent in daily scenarios but difficult to fully reproduce. T...
MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models
Diffusion large language models dLLMs generate text by iteratively denoising partially masked sequences under bidirectional context, exposing a safety surface distinct from autoregressive LLMs. Because mask tokens are native inputs and tokens are committed by confidence rather than position,...
Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking
Jailbreak attacks on large language models LLMs aim to induce LLMs to produce content that they are expected to refuse. Automated black-box jailbreak generation is especially important for safety evaluation, where the attacker observes only model outputs and needs to automatically search for...
MRMMIA: Membership Inference Attacks on Memory in Chat Agents
Membership inference attacks MIAs test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent...
VibeHacking
👾 Welcome to Vibe Hacking By BlackPC, Vine & Foxxino Inc...
Babel: Jailbreaking Safety Attention Via Obfuscation Distribution Optimized Sampling
Despite rigorous safety alignment, Large Language Models LLMs remain vulnerable to jailbreak attacks. Existing black-box methods often rely on heuristic templates or exhaustive trials, lacking mechanistic interpretability and query efficiency. In this study, we investigate an intrinsic...
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw
Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in...
Re-Triggering Safeguards within LLMs for Jailbreak Detection
This paper proposes a jailbreaking prompt detection method for large language models LLMs to defend against jailbreak attacks. Although recent LLMs are equipped with built-in safeguards, it remains possible to craft jailbreaking prompts that bypass them. We argue that such jailbreaking prompts ar...
Black_Box-Penetration-Testing
BlackBox-Penetration-Testing Black-box penetration test again...
Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors
Local fine-tuning datasets routinely contain sensitive secrets such as API keys, personal identifiers, and financial records. Although ''local offline fine-tuning'' is often viewed as a privacy boundary, we reveal that compromised model code is sufficient to steal them. Current passive...
security-audit
security-audit A Claude Code skill + plugin marketplace for a...
AutoRISE: Agent-Driven Strategy Evolution for Red-Teaming Large Language Models
Automated red-teaming methods for large language models typically optimize attack prompts within a fixed, human-designed strategy, leaving the attack strategy itself unchanged. We instead optimize the strategy. We propose AutoRISE, a method that searches over executable attack programs rather tha...
Towards Optimal Agentic Architectures for Offensive Security Tasks
Agentic security systems increasingly audit live targets with tool-using LLMs, but prior systems fix a single coordination topology, leaving unclear when additional agents help and when they only add cost. We treat topology choice as an empirical systems question. We introduce a controlled...
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses under White-Box and Black-Box Threats
Concept drift and adversarial evasion are two major challenges for deploying machine learning-based malware detectors. While both have been studied separately, their combination, the adversarial robustness of drift-adaptive detectors, remains unexplored. We address this problem with AdvDA, a rece...
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
Large language models LLMs possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models LRMs, which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotte...
AutoEG: Exploiting Known Third-Party Vulnerabilities in Black-Box Web Applications
Large-scale web applications are widely deployed with complex third-party components, inheriting security risks arising from component vulnerabilities. Security assessment is therefore required to determine whether such known vulnerabilities remain practically exploitable in real applications...
Targeted Adversarial Traffic Generation : Black-Box Approach to Evade Intrusion Detection Systems in IoT Networks
The integration of machine learning ML algorithms into Internet of Things IoT applications has introduced significant advantages alongside vulnerabilities to adversarial attacks, especially within IoT-based intrusion detection systems IDS. While theoretical adversarial attacks have been extensive...
New .NET AOT Malware Hides Code as a Black Box to Evade Detection
Researchers at Howler Cell have discovered a new .NET AOT malware campaign that uses a clever scoring system…...
The Role of Learning in Attacking Intrusion Detection Systems
Recent work on network attacks have demonstrated that ML-based network intrusion detection systems NIDS can be evaded with adversarial perturbations. However, these attacks rely on complex optimizations that have large computational overheads, making them impractical in many real-world settings. ...
RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance
Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset...