658 matches found
CVE-2026-5970
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
CVE-2026-5970 FoundationAgents MetaGPT HumanEvalBenchmark/MBPPBenchmark check_solution code injection
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
PT-2026-31669
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function check solution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. T...
accessiqlue (=2025.12.21154255), agent-builder (>=0.0.2 <=0.1.7) +320 more potentially affected by CVE-2026-40087 via langchain-core (>=1.0.0a8 <=1.2.24)
langchain-core PYPI version =1.0.0a8, =0.0.2, =0.1.0, =0.1.0, =0.1.1 - ai-benchmark-analyzer =2025.12.21193050 - ai-claim-essence =2025.12.20202921 - ai-design-insights =2025.12.21145447 - ai-mysql-translator =2025.12.21101721 - ai-reliability-analyzer =2025.12.21171415 - ai-risk-extractor...
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
The rapid advancement of Large Language Models LLMs has created new opportunities for Automated Penetration Testing AutoPT, spawning numerous frameworks aimed at achieving end-to-end autonomous attacks. However, despite the proliferation of related studies, existing research generally lacks...
Combating Data Laundering in LLM Training
Data rights owners can detect unauthorized data use in large language model LLM training by querying with proprietary samples. Often, superior performance e.g., higher confidence or lower loss on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to...
[SECURITY] Fedora 42 Update: rust-resctl-bench-2.2.5-12.fc42
resctl-bench is a collection of whole-system benchmarks to evaluate resource control and hardware behaviors using realistic simulated workloads. Comprehensive resource control involves the whole system. Furthermore, testing resource control end-to-end requires scenarios involving realistic...
[SECURITY] Fedora 44 Update: rust-resctl-bench-2.2.5-12.fc44
resctl-bench is a collection of whole-system benchmarks to evaluate resource control and hardware behaviors using realistic simulated workloads. Comprehensive resource control involves the whole system. Furthermore, testing resource control end-to-end requires scenarios involving realistic...
Fedora 43 : cpp-httplib (2026-e76feaf213)
The remote Fedora 43 host has a package installed that is affected by a vulnerability as referenced in the FEDORA-2026-e76feaf213 advisory. Update to 0.38.0 rhbz2447261 - Filename sanitization for path traversal prevention Added sanitizefilename to prevent path traversal attacks via malicious...
Fedora 44 : cpp-httplib (2026-03599f0b32)
The remote Fedora 44 host has a package installed that is affected by a vulnerability as referenced in the FEDORA-2026-03599f0b32 advisory. Update to 0.38.0 rhbz2447261 - Filename sanitization for path traversal prevention Added sanitizefilename to prevent path traversal attacks via malicious...
Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-Agent AI Systems
As Large Language Models LLMs and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to...
Red-MIRROR: Agentic LLM-Based Autonomous Penetration Testing with Reflective Verification and Knowledge-Augmented Interaction
Web applications remain the dominant attack surface in cybersecurity, where vulnerabilities such as SQL injection, XSS, and business logic flaws continue to cause significant data breaches. While penetration testing is effective for identifying these weaknesses, traditional manual approaches are...
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Large language models LLMs increasingly rely on explicit chain-of-thought CoT reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect...
Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation
The emergence of Large Language Model-enhanced Search Engines LLMSEs has revolutionized information retrieval by integrating web-scale search capabilities with AI-powered summarization. While these systems demonstrate improved efficiency over traditional search engines, their security implication...
OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection
Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates th...
Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+
The migration to post-quantum cryptography is urgent for Internet of Things devices with 10-20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of...
Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection
Software vulnerabilities continue to grow in volume and remain difficult to detect in practice. Although learning-based vulnerability detection has progressed, existing benchmarks are largely function-centric and fail to capture realistic, executable, interprocedural settings. Recent repo-level...
TOSSS: A CVE-Based Software Security Benchmark for Large Language Models
With their increasing capabilities, Large Language Models LLMs are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are...
A Systematic Study of LLM-Based Architectures for Automated Patching
Large language models LLMs have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching...
ATLAS: AI-Assisted Threat-To-Assertion Learning for System-On-Chip Security Verification
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip SoC security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration CWE, ATLAS identifies SoC-specific assets, maps...