Lucene search
K

172 matches found

Packet Storm News
Packet Storm News
added 2026/06/11 12:0 a.m.4 views

MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems

Hierarchical multi-agent systems MAS are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particular...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/10 12:0 a.m.6 views

InjectV: Modeling Fault Injection Attacks in RISC-V Simulation Environment

Fault Injection Attacks FIAs are a significant threat to hardware security, capable of compromising systems by inducing malicious faults in computation or storage. Evaluating resilience against such attacks is challenging due to the high cost, complexity, and limited availability of physical faul...

5.5AI score
Exploits0
vulnersOsv
vulnersOsv
added 2026/06/08 11:1 p.m.4 views

ai.ancf.lmos-router:benchmarks (>=0.2.0 <=0.28.0), ai.ancf.lmos-router:lmos-router-hybrid (>=0.2.0 <=0.28.0) +12787 more potentially affected by CVE-2026-45536 via io.netty:netty-transport-native-epoll (>=4.0.21.Final <=4.1.134.Final)

io.netty:netty-transport-native-epoll MAVEN version =4.0.21.Final, =0.2.0, =0.2.0, =0.2.0, =0.2.0, =0.2.0, =0.2.0, =0.1.1, =0.1.1, =0.1.1, =0.0.4, =0.6.0 - ai.ancf.lmos:lmos-router-hybrid =0.1.0 - ai.ancf.lmos:lmos-router-hybrid-spring-boot-starter =0.1.0 - ai.ancf.lmos:lmos-router-llm =0.1.0 -...

5.4AI score0.00193EPSS
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/07 12:0 a.m.5 views

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 16% hackable by frontier models given only the task description. This corrupts both...

5.5AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/05 12:0 a.m.6 views

MOLOT System Card: Malicious Operational Logic Observation Transformer

MOLOT Malicious Operational Logic Observation Transformer is a static malicious-code detection system designed for SAST setup where package metadata, maintainer history, and dynamic execution traces may be unavailable or unreliable. The system represents source code as behavior sequences derived...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/01 12:0 a.m.3 views

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

Diffusion large language models dLLMs generate text by iteratively denoising partially masked sequences under bidirectional context, exposing a safety surface distinct from autoregressive LLMs. Because mask tokens are native inputs and tokens are committed by confidence rather than position,...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/01 12:0 a.m.5 views

Gate AI: LLM Security Benchmark Evaluation Methodology and Results

Published evaluations of prompt-injection and jailbreak detectors for Large Language Models often suffer from two systematic weaknesses: per-dataset threshold tuning and undisclosed operating points. We describe an evaluation harness that addresses both. The detector under evaluation is scored...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/06/01 12:0 a.m.6 views

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent...

5.9AI score
Exploits0
Schneier on Security
Schneier on Security
added 2026/05/20 2:21 p.m.6 views

On AI Security

Good report: Executive Summary: Let's say you wanted to make sure that your AI is secure. Can you just maximize the security and privacy benchmark and call it a day? Nope, because benchmarks don't actually work for measuring AI capabilities even when they are NOT emergent systemic properties like...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/20 12:0 a.m.6 views

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security

Affordances and permissions are promising and timely safety levers for mitigating Loss of Control LoC threats in high-stakes deployment contexts, such as national security. Deployers in defense and intelligence could rely on several approaches to identify which affordances and permissions should ...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/17 12:0 a.m.4 views

MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair

Modern software ecosystems face a rapidly growing number of disclosed vulnerabilities, increasing the need for automated repair techniques that can operate reliably at repository scale. Although Large Language Model LLM-based agents have recently shown promise for automated vulnerability repair...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/15 12:0 a.m.6 views

Context-Aware Entity-Relation Extraction for Threat Intelligence Knowledge Graphs

Cybersecurity Knowledge Graphs CKGs unify diverse Cyber Threat Intelligence CTI sources into structured, queryable formats, offering scalable solutions for automating proactive and real-time security responses. Their increasing adoption has significantly enhanced the workflow and decision-making...

5.7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/12 12:0 a.m.10 views

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue tha...

5.8AI score
Exploits0
vulnersOsv
vulnersOsv
added 2026/05/07 12:22 a.m.9 views

ai.agentican:agentican-framework-core (>=0.1.0-alpha.2 <=0.1.0-alpha.4), ai.agentican:agentican-quarkus-deployment (>=0.1.0-alpha.1 <=0.1.0-alpha.4) +23724 more potentially affected by CVE-2026-42585 via io.netty:netty-codec-http (>=4.0.0.Alpha1 <=4.1.132.Final)

io.netty:netty-codec-http MAVEN version =4.0.0.Alpha1, =0.1.0-alpha.2, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.1, =0.1.0-alpha.3, =0.1.0-alpha.2, =0.1.0, =0.1.0, =0.2.0, =0.2.0, =0.28.0 and more Source cves:...

7.5CVSS6.8AI score0.00239EPSS
Exploits1
Packet Storm News
Packet Storm News
added 2026/05/06 12:0 a.m.14 views

GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy

Production LLM systems require both safety moderation and PII detection under strict latency and cost constraints. This creates a trade-off: autoregressive moderators are accurate but expensive, while lightweight encoders are faster but less capable. We present GLiNER Guard GLiGuard, a unified...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/30 12:0 a.m.3 views

XekRung Technical Report

We present XekRung, a frontier large language model for cybersecurity, designed to provide comprehensive security capabilities. To achieve this, we develop diverse data synthesis pipelines tailored to the cybersecurity domain, enabling the scalable construction of high-quality training data and...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/25 12:0 a.m.3 views

From Stateless Queries to Autonomous Actions: A Layered Security Framework for Agentic AI Systems

Agentic AI systems face security challenges that stateless large language models do not. They plan across extended horizons, maintain persistent memory, invoke external tools, and coordinate with peer agents. Existing security analyses organize threats by attack type prompt injection, jailbreakin...

5.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/21 12:0 a.m.5 views

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

Large Language Model LLM agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag CTF challenges ...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/20 12:0 a.m.2 views

ARES: Adaptive Red-Teaming and End-To-End Repair of Policy-Reward System

Reinforcement Learning from Human Feedback RLHF is central to aligning Large Language Models LLMs, yet it introduces a critical vulnerability: an imperfect Reward Model RM can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/07 12:0 a.m.1 views

Swiss-Bench 003: Evaluating LLM Reliability and Adversarial Security for Swiss Regulatory Contexts

The deployment of large language models LLMs in Swiss financial and regulatory contexts demands empirical evidence of both production reliability and adversarial security, dimensions not jointly operationalized in existing Swiss-focused evaluation frameworks. This paper introduces Swiss-Bench 003...

5.9AI score
Exploits0
Rows per page
Query Builder