646 matches found
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model LLM agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The...
Owner-Harm: A Missing Threat Model for AI Agent Safety
Existing AI agent safety benchmarks focus on generic criminal harm cybercrime, harassment, weapon synthesis, leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI...
InduGuard_vul_poc
ICS Vulnerability PoC Library — SCAA Benchmark Support Proof-...
SDLLMFuzz: Dynamic-Static LLM-Assisted Greybox Fuzzing for Structured Input Programs
Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict syntactic constraints and limited semantic awareness. Traditional greybox fuzzers rely on mutation-based strategies and coarse-grained coverage feedbac...
n-days-poc-benchmark-and-dataset
ICS N-Day Vulnerability PoC Benchmark Suite A structured coll...
RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code
How do security scanners perform on real-world code? We present RealVuln, the first open-source benchmark comparing Rule-Based SAST, General-Purpose LLMs, and Security-Specialized scanners on 26 intentionally vulnerable Python repositories educational and Capture-The-Flag applications with 796...
CVE-2026-32288 vulnerabilities
Vulnerabilities for packages: rke2-runtime-fips, localstack, docker-cli-fips, vault-csi-provider, gosu, podman-fips, external-secrets-operator, libnvidia-container, helm-fips, kubernetes-fips, gitlab-operator-fips, terraform, aws-flb-firehose, telegraf, crane-fips, flux-fips, argo-workflows-fips,...
GHSA-X4JJ-H2V8-HQQV vulnerabilities
Vulnerabilities for packages: rke2-runtime-fips, localstack, docker-cli-fips, vault-csi-provider, gosu, podman-fips, external-secrets-operator, libnvidia-container, helm-fips, kubernetes-fips, gitlab-operator-fips, terraform, aws-flb-firehose, telegraf, crane-fips, flux-fips, argo-workflows-fips,...
GHSA-G977-H85W-H2XJ MetaGPT has an Injection issue
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
CVE-2026-5970
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
CVE-2026-5970 FoundationAgents MetaGPT HumanEvalBenchmark/MBPPBenchmark check_solution code injection
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
CVE-2026-5970
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
CVE-2026-5970 FoundationAgents MetaGPT HumanEvalBenchmark/MBPPBenchmark check_solution code injection
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function checksolution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. Th...
PT-2026-31669
A vulnerability was detected in FoundationAgents MetaGPT up to 0.8.1. This affects the function check solution of the component HumanEvalBenchmark/MBPPBenchmark. Performing a manipulation results in code injection. The attack may be initiated remotely. The exploit is now public and may be used. T...
accessiqlue (=2025.12.21154255), agent-builder (>=0.0.2 <=0.1.7) +321 more potentially affected by CVE-2026-40087 via langchain-core (>=1.0.0a8 <=1.2.24)
langchain-core PYPI version =1.0.0a8, =0.0.2, =0.1.0, =0.1.0, =0.1.1 - ai-benchmark-analyzer =2025.12.21193050 - ai-claim-essence =2025.12.20202921 - ai-design-insights =2025.12.21145447 - ai-mysql-translator =2025.12.21101721 - ai-reliability-analyzer =2025.12.21171415 - ai-risk-extractor...
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
The rapid advancement of Large Language Models LLMs has created new opportunities for Automated Penetration Testing AutoPT, spawning numerous frameworks aimed at achieving end-to-end autonomous attacks. However, despite the proliferation of related studies, existing research generally lacks...
Combating Data Laundering in LLM Training
Data rights owners can detect unauthorized data use in large language model LLM training by querying with proprietary samples. Often, superior performance e.g., higher confidence or lower loss on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to...
[SECURITY] Fedora 42 Update: rust-resctl-bench-2.2.5-12.fc42
resctl-bench is a collection of whole-system benchmarks to evaluate resource control and hardware behaviors using realistic simulated workloads. Comprehensive resource control involves the whole system. Furthermore, testing resource control end-to-end requires scenarios involving realistic...
[SECURITY] Fedora 44 Update: rust-resctl-bench-2.2.5-12.fc44
resctl-bench is a collection of whole-system benchmarks to evaluate resource control and hardware behaviors using realistic simulated workloads. Comprehensive resource control involves the whole system. Furthermore, testing resource control end-to-end requires scenarios involving realistic...
Fedora 43 : cpp-httplib (2026-e76feaf213)
The remote Fedora 43 host has a package installed that is affected by a vulnerability as referenced in the FEDORA-2026-e76feaf213 advisory. Update to 0.38.0 rhbz2447261 - Filename sanitization for path traversal prevention Added sanitizefilename to prevent path traversal attacks via malicious...