67 matches found
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents
Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent...
Practical Countermeasure against Attacks Exploiting Detection Efficiency Mismatch in Quantum Key Distribution
We demonstrate a practical countermeasure against a well-known class of attacks on quantum key distribution QKD systems that exploit detection efficiency mismatch, where the receiver's detectors do not exhibit identical responses to incoming photons across all degrees of freedom. This class of...
BYOT-CPS: A Hybrid Cyber-Physical Systems Testbed for IoT Security Assessment and Platform Evaluation
Internet of Things IoT security research continues to face a methodological gap between scalable virtual experimentation and realistic device behaviour. While pure simulation and emulation platforms provide control, repeatability, and scale, they do not fully reproduce firmware-specific behaviour...
Exploit for Deserialization of Untrusted Data in Facebook React
CVE-2025-55182 Security Lab "React2Shell" This repository c...
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications
Safety-aligned language models often refuse cybersecurity requests whose wording resembles misuse, even when the task is authorized and defensive. This makes security evaluation ambiguous: a failed answer may reflect missing capability or refusal-policy intervention. Ablating Safety studies...
OpenAI’s GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities
The UK's AI Security Institute evaluated GPT-5.5's ability to find security vulnerabilities, and found that it is comparable to Claude Mythos. Note that the OpenAI model is generally available. Here is the Institute's evaluation of Mythos. And here is an analysis of a smaller, cheaper model. It...
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw
Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in...
GoAT-X: A Graph of Auditing Thoughts for Securing Token Transactions in Cross-Chain Contracts
Cross-chain bridges, the critical infrastructure of the multi-chain ecosystem, have become a primary target for attackers, resulting in over $2.8 billion in losses due to subtle implementation flaws. Existing defenses, such as bytecode-level static analysis, are ill-equipped to handle the semanti...
AVISE: Framework for Evaluating the Security of AI Systems
As artificial intelligence AI systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we...
CVE-2026-33622 A PinchTab Security Policy Bypass in /wait Allows Arbitrary JavaScript Execution
PinchTab is a standalone HTTP server that gives AI agents direct control over a Chrome browser. PinchTab v0.8.3 through v0.8.5 allow arbitrary JavaScript execution through POST /wait and POST /tabs/id/wait when the request uses fn mode, even if security.allowEvaluate is disabled. POST /evaluate...
Quantifying Memory Cells Vulnerability for DRAM Security
Dynamic Random Access Memory DRAM is pervasive in computer systems. Cell vulnerabilities caused by unintended phenomena forced retention failure, latency alteration, rowhammer and rowpress lead to unintended bit flips in memory. These phenomena have been explored as attacks to violate data...
TOSSS: A CVE-Based Software Security Benchmark for Large Language Models
With their increasing capabilities, Large Language Models LLMs are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are...
Security Considerations for Multi-Agent Systems
Multi-agent artificial intelligence systems or MAS are systems of autonomous agents that exercise delegated tool authority, share persistent memory, and coordinate via inter-agent communication. MAS introduces qualitatively distinct security vulnerabilities from those documented for singular AI...
EVMbench: Evaluating AI Agents on Smart Contract Security
Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in way...
AdapTools: Adaptive Tool-Based Indirect Prompt Injection Attacks on Agentic LLMs
The integration of external data services e.g., Model Context Protocol, MCP has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection IPI attacks...
MalTool: Malicious Tool Attacks on LLM Agents
In a malicious tool attack, an attacker uploads a malicious tool to a distribution platform; once a user installs the tool and the LLM agent selects it during task execution, the tool can compromise the user's security and privacy. Prior work primarily focuses on manipulating tool names and...
EUVD-2026-0130
This CVE ID was rejected because it was reserved but not used for a vulnerability disclosure...
AutoBaxBuilder: Bootstrapping Code Security Benchmarking
As LLMs see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work has demonstrated that security is often overlooked, exposing that LLMs are prone to generating code with security vulnerabilities. These...
Key Length-Oriented Classification of Lightweight Cryptographic Algorithms for IoT Security
The successful deployment of the Internet of Things IoT applications relies heavily on their robust security, and lightweight cryptography is considered an emerging solution in this context. While existing surveys have been examining lightweight cryptographic techniques from the perspective of...
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
As the use of large language models LLMs continues to expand, ensuring their safety and robustness has become a critical challenge. In particular, jailbreak attacks that bypass built-in safety mechanisms are increasingly recognized as a tangible threat across industries, driving the need for...