Lucene search
K

78 matches found

Packet Storm News
Packet Storm News
added 2026/05/12 12:0 a.m.5 views

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will perform best in real-world targets. Existing evaluation protocols assess and optimize for predefined goals such as capture-the-flag, remote code...

6.1AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/05 12:0 a.m.7 views

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing safety alignment evaluates overt requests in isolation, leaving models blind to malicious end-states that emerge from...

5.9AI score
Exploits0
Spring Security Advisories
Spring Security Advisories
added 2026/05/04 12:0 a.m.9 views

Spring Office Hours Podcast: S5E14 - Spec Driven Development with Simon Martinelli

Join Dan Vega and DaShaun Carter for the latest updates from the Spring Ecosystem. In this episode, Dan and DaShaun are joined by Java Champion, Vaadin Champion, and Oracle ACE Pro Simon Martinelli to talk about Spec-Driven Development. With AI reshaping how we write code, Simon makes the case th...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/22 12:0 a.m.3 views

A Ground-Truth-Based Evaluation of Vulnerability Detection across Multiple Ecosystems

Automated vulnerability detection tools are widely used to identify security vulnerabilities in software dependencies. However, the evaluation of such tools remains challenging due to the heterogeneous structure of vulnerability data sources, inconsistent identifier schemes, and ambiguities in...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/21 12:0 a.m.16 views

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model LLM agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guided questions or hints, identify the exact timestamps of malicious events. The...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/15 12:0 a.m.2 views

RealVuln: Benchmarking Rule-Based, General-Purpose LLM, and Security-Specialized Scanners on Real-World Code

How do security scanners perform on real-world code? We present RealVuln, the first open-source benchmark comparing Rule-Based SAST, General-Purpose LLMs, and Security-Specialized scanners on 26 intentionally vulnerable Python repositories educational and Capture-The-Flag applications with 796...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/13 12:0 a.m.2 views

RedShell: A Generative AI-Based Approach to Ethical Hacking

The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI leverage the natural language processing capabilities of Large Language Models to generate machine code from natural language descriptions. In the...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/13 12:0 a.m.4 views

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/26 12:0 a.m.4 views

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

System prompt configuration can make the difference between near-total phishing blindness and near-perfect detection in LLM email agents. We present PhishNChips, a study of 11 models under 10 prompt strategies, showing that prompt-model interaction is a first-order security variable: a single...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/26 12:0 a.m.0 views

A Large-Scale Empirical Study on the Generalizability of Disclosed Java Library Vulnerability Exploits

Open-source software supply chain security relies heavily on assessing affected versions of library vulnerabilities. While prior studies have leveraged exploits for verifying vulnerability affected versions, they point out a key limitation that exploits are version-specific and cannot be directly...

6.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/23 12:0 a.m.4 views

OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection

Synthetic insider threat benchmarks face a consistency problem: corpora generated without an external factual constraint cannot rule out cross-artifact contradictions. The CERT dataset -- the field's canonical benchmark -- is also static, lacks cross-surface correlation scenarios, and predates th...

5.8AI score
Exploits0
Microsoft Secure
Microsoft Secure
added 2026/02/19 5:0 p.m.3 views

New e-book: Establishing a proactive defense with Microsoft Security Exposure Management

Effective exposure management begins by illuminating and hardening risks across the entire attack surface. Some of the most meaningful shifts in security happen quietly—when teams take a clear look at their exposure landscape and acknowledge the gap between where they stand today and where they...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/05 12:0 a.m.4 views

Persistent Human Feedback, LLMs, and Static Analyzers for Secure Code Generation and Vulnerability Detection

Existing literature heavily relies on static analysis tools to evaluate LLMs for secure code generation and vulnerability detection. We reviewed 1,080 LLM-generated code samples, built a human-validated ground-truth, and compared the outputs of two widely used static security tools, CodeQL and...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/30 12:0 a.m.4 views

PIDSMaker: Building and Evaluating Provenance-Based Intrusion Detection Systems

Recent provenance-based intrusion detection systems PIDSs have demonstrated strong potential for detecting advanced persistent threats APTs by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessi...

5.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/18 12:0 a.m.5 views

AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation

The National Institute of Standards and Technology NIST Computer Forensic Tool Testing CFTT programme has become the de facto standard for providing digital forensic tool testing and validation. However to date, no comprehensive framework exists to automate benchmarking across the diverse forensi...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/16 12:0 a.m.28 views

PentestEval: Benchmarking LLM-Based Penetration Testing with Modular and Stage-Level Design

Penetration testing is essential for assessing and strengthening system security against real-world threats, yet traditional workflows remain highly manual, expertise-intensive, and difficult to scale. Although recent advances in Large Language Models LLMs offer promising opportunities for...

6.6AI score
Exploits0
EUVD
EUVD
added 2025/10/22 3:40 p.m.4 views

EUVD-2025-35304

Nautobot Single Source of Truth SSoT is an app for Nautobot. Prior to version 3.10.0, an unauthenticated attacker could access this page to view the Service Now public instance name e.g. companyname.service-now.com. This is considered low-value information. This does not expose the Secret, the...

5.3CVSS6.5AI score0.00268EPSS
Exploits0References5
Cvelist
Cvelist
added 2025/10/22 3:40 p.m.9 views

CVE-2025-62607 Nautobot Single Source of Truth (SSoT) has an unauthenticated ServiceNow configuration URL

Nautobot Single Source of Truth SSoT is an app for Nautobot. Prior to version 3.10.0, an unauthenticated attacker could access this page to view the Service Now public instance name e.g. companyname.service-now.com. This is considered low-value information. This does not expose the Secret, the...

5.3CVSS0.00268EPSS
Exploits0References3
Cvelist
Cvelist
added 2025/10/21 12:0 a.m.9 views

CVE-2025-60511

Moodle OpenAI Chat Block plugin 3.0.1 2025021700 suffers from an Insecure Direct Object Reference IDOR vulnerability due to insufficient validation of the blockId parameter in /blocks/openaichat/api/completion.php. An authenticated student can impersonate another user's block e.g., administrator...

0.00232EPSS
Exploits0References4
OSSF Malicious Packages
OSSF Malicious Packages
added 2025/08/14 6:52 p.m.2 views

Malicious code in test-mlw2-gyron-terts-mayed-truth (npm)

The package test-mlw2-gyron-terts-mayed-truth was found to contain malicious code...

7AI score
Exploits0
Rows per page
Query Builder