Lucene search
K

4 matches found

Packet Storm News
Packet Storm News
added 2026/04/21 12:0 a.m.5 views

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

Large Language Model LLM agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag CTF challenges ...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/13 12:0 a.m.2 views

AnyPoC: Universal Proof-Of-Concept Test Generation for Scalable LLM-Based Bug Detection

While recent LLM-based agents can identify many candidate bugs in source code, their reports remain static hypotheses that require manual validation, limiting the practicality of automated bug detection. We frame this challenge as a test generation task: given a candidate report, synthesizing an...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/11/24 12:0 a.m.3 views

Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains

As AI agents become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. While previous work has demonstrated behavioral backdoor detection within individual LLM architectures, the critical question ...

7.3AI score
Exploits0
Kitploit
Kitploit
added 2018/02/04 1:30 p.m.16 views

IDAsec - IDA plugin for reverse-engineering and dynamic interactions with the Binsec platform

IDA plugin for reverse-engineering and dynamic interactions with the Binsec platform Features Decoding an instruction in DBA IR Loading execution traces generated by Pinsec Triggering analyzes on Binsec and retrieving results Dependencies protobuf ZMQ capstone for trace disassembly graphviz to dr...

7.5AI score
Exploits0References1
Rows per page
Query Builder