62 matches found
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...
ACIArena: Toward Unified Evaluation for Agent Cascading Injection
Collaboration and information sharing empower Multi-Agent Systems MAS but also introduce a critical security risk known as Agent Cascading Injection ACI. In such attacks, a compromised agent exploits inter-agent trust to propagate malicious instructions, causing cascading failures across the...
A Systematic Security Evaluation of OpenClaw and Its Variants
Tool-augmented AI agents substantially extend the practical capabilities of large language models, but they also introduce security risks that cannot be identified through model-only evaluation. In this paper, we present a systematic security assessment of six representative OpenClaw-series agent...
Towards Leveraging LLMs to Generate Abstract Penetration Test Cases from Software Architecture
Software architecture models capture early design decisions that strongly influence system quality attributes, including security. However, architecture-level security assessment and feedback are often absent in practice, allowing security weaknesses to propagate into later phases of the software...
CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
The evaluation of Large Language Models LLMs for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agen...
CVE-2026-23077
In the Linux kernel, the following vulnerability has been resolved: mm/vma: fix anonvma UAF on mremap faulted, unfaulted merge Patch series "mm/vma: fix anonvma UAF on mremap faulted, unfaulted merge", v2. Commit 879bca0a2c4f "mm/vma: fix incorrectly disallowed anonymous VMA merges" introduced th...
AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation
The National Institute of Standards and Technology NIST Computer Forensic Tool Testing CFTT programme has become the de facto standard for providing digital forensic tool testing and validation. However to date, no comprehensive framework exists to automate benchmarking across the diverse forensi...
BackportBench: A Multilingual Benchmark for Automated Backporting of Patches
Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulnerable package versions because upgrading can be difficult and may break their existing codebase...
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Search agents connect LLMs to the Internet, enabling access to broader and more up-to-date information. However, unreliable search results may also pose safety threats to end users, establishing a new threat surface. In this work, we conduct two in-the-wild experiments to demonstrate both the...
XSS-CTFs
XSS-CTFs Contains hands-on XSS test cases from beginner...
STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation
In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains a labor-intensive,...
WAFTest
This repository is an offensive tool for testing web application firewalls WAFs. It contains a collection of test cases and scripts to evaluate the effectiveness of WAFs against various types of attacks. The tool includes test cases for common web application vulnerabilities such as: Command...
SUSE CVE-2025-38708
In the Linux kernel, the following vulnerability has been resolved: drbd: add missing krefget in handlewriteconflicts With two-primaries enabled, DRBD tries to detect "concurrent" writes and handle write conflicts, so that even if you write to the same sector simultaneously on both nodes, they en...
Exploit for Out-of-bounds Write in F5 Nginx
Disclosures Zero-day and N-day security vulnerability notes, analysis, and proof-of-concepts URL: https://github.com/badd1e/Disclosures List CVE-2009-2629: nginx http module Buffer Underflow Remote Code Execution Vulnerability Patch analysis, testcase, notes CVE-2013-0007: Microsoft XML Core...
Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications
Recent advancements in LLMs indicate potential for novel applications, e.g., through reasoning capabilities in the latest OpenAI and DeepSeek models. For applying these models in specific domains beyond text generation, LLM-based multi-agent approaches can be utilized that solve complex tasks by...
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Computer-Use Agents CUAs with full system access enable powerful task automation but pose significant security and privacy risks due to their ability to manipulate files, access user data, and execute arbitrary commands. While prior work has focused on browser-based agents and HTML-level attacks,...
PT-2026-4771
Name of the Vulnerable Software and Affected Versions eslint versions prior to 9.26.0 Description A stack overflow issue exists in eslint when serializing objects containing circular references within the eslint/lib/shared/serialization.js file. The issue is triggered through the RuleTester.run...
Ollama < 0.1.34 Improper Input Validation
The version of Ollama installed on the remote host is prior to 0.1.34. It is, therefore, affected by an improper input validation vulnerability. Ollama before 0.1.34 does not validate the format of the digest sha256 with 64 hex digits when getting the model path, and thus mishandles the...
Improper Input Validation
github.com/ollama/ollama is vulnerable to Improper Input Validation. The vulnerability is due to improper validation of the digest format sha256 with 64 hex digits when getting the model path, which results in the mishandling of the TestGetBlobsPath test cases with fewer than 64 hex digits, more...
CVE-2024-37032
Ollama before 0.1.34 does not validate the format of the digest sha256 with 64 hex digits when getting the model path, and thus mishandles the TestGetBlobsPath test cases such as fewer than 64 hex digits, more than 64 hex digits, or an initial ../ substring...