647 matches found
SoK: Measuring What Matters for Closed-Loop Security Agents
Cybersecurity is a relentless arms race, with AI driven offensive systems evolving faster than traditional defenses can adapt. Research and tooling remain fragmented across isolated defensive functions, creating blind spots that adversaries exploit. Autonomous agents capable of integrating, explo...
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the...
Binary Diff Summarization Using Large Language Models
Security of software supply chains is necessary to ensure that software updates do not contain maliciously injected code or introduce vulnerabilities that may compromise the integrity of critical infrastructure. Verifying the integrity of software updates involves binary differential analysis...
SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios
Large language model LLM powered code agents are rapidly transforming software engineering by automating tasks such as testing, debugging, and repairing, yet the security risks of their generated code have become a critical concern. Existing benchmarks have offered valuable insights but remain...
FakeSound2: a Benchmark for Explainable and Generalizable Deepfake Sound Detection
The rapid development of generative audio raises ethical and security concerns stemming from forged data, making deepfake sound detection an important safeguard against the malicious use of such technologies. Although prior studies have explored this task, existing methods largely focus on binary...
CVE-2025-47906 vulnerabilities
Vulnerabilities for packages: blobfuse2, kube-vip-cloud-provider, vexctl, shfmt, rancher-machine, pvc-autoresizer, kube-vip, custom-pod-autoscaler-operator, vault-k8s, git-lfs, nats, gitlab-runner, modelmesh-runtime-adapter, linkerd2-proxy-init, lvm-driver, vault-benchmark, checksec,...
GHSA-GWRF-JF3H-W649 vulnerabilities
Vulnerabilities for packages: blobfuse2, kube-vip-cloud-provider, vexctl, shfmt, rancher-machine, pvc-autoresizer, kube-vip, custom-pod-autoscaler-operator, vault-k8s, git-lfs, nats, gitlab-runner, modelmesh-runtime-adapter, linkerd2-proxy-init, lvm-driver, vault-benchmark, checksec,...
Time-of-Check Time-of-Use Attacks Against LLMs
This is a nice piece of research: "Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents".: Abstract: Large Language Model LLM-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security...
Exploiting Timing Side-Channels in Quantum Circuits Simulation Via ML-Based Methods
As quantum computing advances, quantum circuit simulators serve as critical tools to bridge the current gap caused by limited quantum hardware availability. These simulators are typically deployed on cloud platforms, where users submit proprietary circuit designs for simulation. In this work, we...
Shell or Nothing: Real-World Benchmarks and Memory-Activated Agents for Automated Penetration Testing
Penetration testing is critical for identifying and mitigating security vulnerabilities, yet traditional approaches remain expensive, time-consuming, and dependent on expert human labor. Recent work has explored AI-driven pentesting agents, but their evaluation relies on oversimplified...
PatchSeeker: Mapping NVD Records to Their Vulnerability-Fixing Commits with LLM Generated Commits and Embeddings
Software vulnerabilities pose serious risks to modern software ecosystems. While the National Vulnerability Database NVD is the authoritative source for cataloging these vulnerabilities, it often lacks explicit links to the corresponding Vulnerability-Fixing Commits VFCs. VFCs encode precise code...
AgentSentinel: an End-To-End and Real-Time Security Defense Framework for Computer-Use Agents
Large Language Models LLMs have been increasingly integrated into computer-use agents, which can autonomously operate tools on a user's computer to accomplish complex tasks. However, due to the inherently unstable and unpredictable nature of LLM outputs, they may issue unintended tool commands or...
Decoding Latent Attack Surfaces in LLMs: Prompt Injection Via HTML in Web Summarization
Large Language Models LLMs are increasingly integrated into web-based systems for content summarization, yet their susceptibility to prompt injection attacks remains a pressing concern. In this study, we explore how non-visible HTML elements such as , aria-label, and alt attributes can be exploit...
Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models
Large Language Models LLMs are increasingly vulnerable to a sophisticated form of adversarial prompting known as camouflaged jailbreaking. This method embeds malicious intent within seemingly benign language to evade existing safety mechanisms. Unlike overt attacks, these subtle prompts exploit...
An Empirical Study of Vulnerabilities in Python Packages and Their Detection
In the rapidly evolving software development landscape, Python stands out for its simplicity, versatility, and extensive ecosystem. Python packages, as units of organization, reusability, and distribution, have become a pressing concern, highlighted by the considerable number of vulnerability...
VULSOVER: Vulnerability Detection Via LLM-Driven Constraint Solving
Traditional vulnerability detection methods rely heavily on predefined rule matching, which often fails to capture vulnerabilities accurately. With the rise of large language models LLMs, leveraging their ability to understand code semantics has emerged as a promising direction for achieving more...
Agentic Discovery and Validation of Android App Vulnerabilities
Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings yet uncover few true positives. Analysts spend days triaging these results, creating a bottleneck in the security pipeline. Meanwhile, genuinely exploitable vulnerabilities often slip through,...
PromptSleuth: Detecting Prompt Injection Via Semantic Intent Invariance
Large Language Models LLMs are increasingly integrated into real-world applications, from virtual assistants to autonomous agents. However, their flexibility also introduces new attack vectors-particularly Prompt Injection PI, where adversaries manipulate model behavior through crafted inputs. As...
Mind the Gap: Time-Of-Check to Time-Of-Use Vulnerabilities in LLM-Enabled Agents
Large Language Model LLM-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks e.g., prompt injection and data-oriented threats e.g., data exfiltration...
Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations
Natural language explanations in visual question answering VQA-NLE aim to make black-box models more transparent by elucidating their decision-making processes. However, we find that existing VQA-NLE systems can produce inconsistent explanations and reach conclusions without genuinely understandi...