646 matches found
Fedora 44 : cpp-httplib (2026-03599f0b32)
The remote Fedora 44 host has a package installed that is affected by a vulnerability as referenced in the FEDORA-2026-03599f0b32 advisory. Update to 0.38.0 rhbz2447261 - Filename sanitization for path traversal prevention Added sanitizefilename to prevent path traversal attacks via malicious...
Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-Agent AI Systems
As Large Language Models LLMs and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to...
Red-MIRROR: Agentic LLM-Based Autonomous Penetration Testing with Reflective Verification and Knowledge-Augmented Interaction
Web applications remain the dominant attack surface in cybersecurity, where vulnerabilities such as SQL injection, XSS, and business logic flaws continue to cause significant data breaches. While penetration testing is effective for identifying these weaknesses, traditional manual approaches are...
Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation
The emergence of Large Language Model-enhanced Search Engines LLMSEs has revolutionized information retrieval by integrating web-scale search capabilities with AI-powered summarization. While these systems demonstrate improved efficiency over traditional search engines, their security implication...
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Large language models LLMs increasingly rely on explicit chain-of-thought CoT reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect...
OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection
Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates th...
Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+
The migration to post-quantum cryptography is urgent for Internet of Things devices with 10-20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of...
Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection
Software vulnerabilities continue to grow in volume and remain difficult to detect in practice. Although learning-based vulnerability detection has progressed, existing benchmarks are largely function-centric and fail to capture realistic, executable, interprocedural settings. Recent repo-level...
TOSSS: A CVE-Based Software Security Benchmark for Large Language Models
With their increasing capabilities, Large Language Models LLMs are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are...
ATLAS: AI-Assisted Threat-To-Assertion Learning for System-On-Chip Security Verification
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip SoC security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration CWE, ATLAS identifies SoC-specific assets, maps...
A Systematic Study of LLM-Based Architectures for Automated Patching
Large language models LLMs have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching...
cipher-xbow-benchmark
Cipher XBOW Benchmark Results Black-box assessment results fr...
Formal Analysis and Supply Chain Security for Agentic AI Skills
The rapid proliferation of agentic AI skill ecosystems -- exemplified by OpenClaw 228,000 GitHub stars and Anthropic Agent Skills 75,600 stars -- has introduced a critical supply chain attack surface. The ClawHavoc campaign January-February 2026 infiltrated over 1,200 malicious skills into the...
APFuzz: Towards Automatic Greybox Protocol Fuzzing
Greybox protocol fuzzing is a random testing approach for stateful protocol implementations, where the input is protocol messages generated from mutations of seeds, and the search in the input space is driven by the feedback on coverage of both code and state. State model and message model are th...
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates...
SUSE CVE-2026-23229
In the Linux kernel, the following vulnerability has been resolved: crypto: virtio - Add spinlock protection with virtqueue notification When VM boots with one virtio-crypto PCI device and builtin backend, run openssl benchmark command with multiple processes, such as openssl speed -evp aes-128-c...
MultiVer: Zero-Shot Multi-Agent Vulnerability Detection
We present MultiVer, a zero-shot multi-agent system for vulnerability detection that achieves state-of-the-art recall without fine-tuning. A four-agent ensemble security, correctness, performance, style with union voting achieves 82.7% recall on PyVul, exceeding fine-tuned GPT-3.5 81.3% by 1.4...
What Makes a Good LLM Agent for Real-World Penetration Testing?
LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28 LLM-based penetration testing systems and evaluate five representative implementations across three benchmarks of increasing complexity. Our analys...
CVE-2026-23229
In the Linux kernel, the following vulnerability has been resolved: crypto: virtio - Add spinlock protection with virtqueue notification When VM boots with one virtio-crypto PCI device and builtin backend, run openssl benchmark command with multiple processes, such as openssl speed -evp aes-128-c...
UBUNTU-CVE-2026-23229
In the Linux kernel, the following vulnerability has been resolved: crypto: virtio - Add spinlock protection with virtqueue notification When VM boots with one virtio-crypto PCI device and builtin backend, run openssl benchmark command with multiple processes, such as openssl speed -evp aes-128-c...