Lucene search
K

16 matches found

Packet Storm News
Packet Storm News
added 2026/06/04 12:0 a.m.10 views

ExploitGym AI Exploit Benchmark Tool

ExploitGym is a large-scale, realistic benchmark built from real-world vulnerabilities designed to evaluate AI agents' ability to develop exploits...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/03 12:0 a.m.17 views

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-To-End Cybersecurity Capabilities

AI has the potential to transform cybersecurity by enabling systems that can autonomously detect, analyze, and remediate software vulnerabilities. However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to capture the end-to-end lifecycle of real-world...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/27 12:0 a.m.10 views

Towards Demystifying and Repairing LLM-In-The-Loop Vulnerabilities

Large Language ModelsLLMs have been actively integrated into modern software systems as critical components. LLM-in-the-loop vulnerabilities, where vulnerabilities are introduced by LLMs and their dependent downstream components, such as frameworks, introduce new risks. Although some benchmark...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/18 12:0 a.m.9 views

Automating Agent Hijacking Via Structural Template Injection

Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model LLM ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/13 12:0 a.m.11 views

Execution-State-Aware LLM Reasoning for Automated Proof-Of-Vulnerability Generation

Proof-of-Vulnerability PoV generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false positive reduction, and patch verification. While directed fuzzing effectively drives path exploration, satisfying complex semantic constraints remains a...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/03 12:0 a.m.10 views

Breaking Isolation: A New Perspective on Hypervisor Exploitation Via Cross-Domain Attacks

Hypervisors are under threat by critical memory safety vulnerabilities, with pointer corruption being one of the most prevalent and severe forms. Existing exploitation frameworks depend on identifying highly-constrained structures in the host machine and accurately determining their runtime...

7.4AI score
Exploits0
GithubExploit
GithubExploit
added 2025/11/29 10:35 a.m.158 views

Practical-Vulnerability-Exploitation

Practical-Vulnerability-Exploitation Hands-on exploi...

7.1AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/21 12:0 a.m.3 views

Prompting the Priorities: A First Look at Evaluating LLMs for Vulnerability Triage and Prioritization

Security analysts face increasing pressure to triage large and complex vulnerability backlogs. Large Language Models LLMs offer a potential aid by automating parts of the interpretation process. We evaluate four models ChatGPT, Claude, Gemini, and DeepSeek across twelve prompting techniques to...

6.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/16 12:0 a.m.6 views

LLM Agents for Automated Web Vulnerability Reproduction: Are We There Yet?

Large language model LLM agents have demonstrated remarkable capabilities in software engineering and cybersecurity tasks, including code generation, vulnerability discovery, and automated testing. One critical but underexplored application is automated web vulnerability reproduction, which...

7.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/10/11 12:0 a.m.4 views

A Systematic Study on Generating Web Vulnerability Proof-Of-Concepts Using Large Language Models

Recent advances in Large Language Models LLMs have brought remarkable progress in code understanding and reasoning, creating new opportunities and raising new concerns for software security. Among many downstream tasks, generating Proof-of-Concept PoC exploits plays a central role in vulnerabilit...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/26 12:0 a.m.39 views

SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios

Large language model LLM powered code agents are rapidly transforming software engineering by automating tasks such as testing, debugging, and repairing, yet the security risks of their generated code have become a critical concern. Existing benchmarks have offered valuable insights but remain...

7.4AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/21 12:0 a.m.17 views

LLaVul: a Multimodal LLM for Interpretable Vulnerability Reasoning about Source Code

Increasing complexity in software systems places a growing demand on reasoning tools that unlock vulnerabilities manifest in source code. Many current approaches focus on vulnerability analysis as a classifying task, oversimplifying the nuanced and context-dependent real-world scenarios. Even...

7AI score
Exploits0
GithubExploit
GithubExploit
added 2025/08/27 9:42 a.m.150 views

PatchProve

PatchProve A PoC-Driven Benchmark for Evaluating Large Lang...

7.2AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/05/25 12:0 a.m.4 views

VADER: a Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation

Ensuring that large language models LLMs can effectively assess, detect, explain, and remediate software vulnerabilities is critical for building robust and secure software systems. We introduce VADER, a human-evaluated benchmark designed explicitly to assess LLM performance across four key...

7.3AI score
Exploits0
SonarSource Blog
SonarSource Blog
added 2021/11/29 12:0 a.m.12 views

Code Security Advent Calendar 2021

We are happy to announce our sixth consecutive Code Security Advent Calendar! Born at RIPS in 2016, each calendar comprises 24 little code puzzles containing hidden security vulnerabilities that wait to be spotted. This is our way to share good vibes with the community while learning and having f...

8AI score
Exploits0
Kitploit
Kitploit
added 2019/08/27 1:18 p.m.203 views

EVABS - Extremely Vulnerable Android Labs

An open source Android application that is intentionally vulnerable so as to act as a learning platform for Android application security beginners. The effort is to introduce beginners with very limited or zero knowledge to some of the major and commonly found real-world based Android application...

7.4AI score
Exploits0References7
Rows per page
Query Builder