Lucene search
K

31 matches found

Packet Storm News
Packet Storm News
added 3 days ago1 views

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/05/26 12:0 a.m.5 views

MRMMIA: Membership Inference Attacks on Memory in Chat Agents

Membership inference attacks MIAs test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/22 12:0 a.m.2 views

AVISE: Framework for Evaluating the Security of AI Systems

As artificial intelligence AI systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/14 12:0 a.m.2 views

LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software

Logical vulnerabilities in software stem from flaws in program logic rather than memory safety, which can lead to critical security failures. Although existing automated program repair techniques primarily focus on repairing memory corruption vulnerabilities, they struggle with logical...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/04/07 12:0 a.m.1 views

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performancee.g., WebArena or safety against malicious actionse.g., SafeArena, no existing framework assesses an agent's ability to...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/30 12:0 a.m.0 views

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-Agent AI Systems

As Large Language Models LLMs and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to...

5.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/30 12:0 a.m.1 views

Why Aggregate Accuracy Is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems

Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demograph...

5.9AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/03/16 12:0 a.m.2 views

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Modern agentic systems allow Large Language Model LLM agents to tackle complex tasks through extensive tool usage, forming structured control flows of tool selection and execution. Existing security analyses often treat these control flows as ephemeral, one-off sessions, overlooking the persisten...

5.9AI score
Exploits0
The Hacker News
The Hacker News
added 2026/03/04 11:30 a.m.2 views

New RFP Template for AI Usage Control and AI Governance 

As AI becomes the central engine for enterprise productivity, security leaders are finally getting the green light — and the budget — to secure it. But there’s a quiet crisis unfolding in the boardroom: many organizations know they need "AI Governance," but they have no idea what they are actuall...

6.1AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/02/24 12:0 a.m.3 views

AdapTools: Adaptive Tool-Based Indirect Prompt Injection Attacks on Agentic LLMs

The integration of external data services e.g., Model Context Protocol, MCP has made large language model-based agents increasingly powerful for complex task execution. However, this advancement introduces critical security vulnerabilities, particularly indirect prompt injection IPI attacks...

6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2026/01/30 12:0 a.m.2 views

The Semantic Trap: Do Fine-Tuned LLMs Learn Vulnerability Root Cause or Just Functional Pattern?

LLMs demonstrate promising performance in software vulnerability detection after fine-tuning. However, it remains unclear whether these gains reflect a genuine understanding of vulnerability root causes or merely an exploitation of functional patterns. In this paper, we identify a critical failur...

5.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/12/02 12:0 a.m.2 views

Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models

Whitepaper from researchers at MIT, Northeastern University, and Meta. For an LLM to correctly respond to an instruction it must understand both the semantics and the domain i.e., subject area of a given task-instruction pair. However, syntax can also convey implicit information Recent work shows...

6.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/11/14 12:0 a.m.3 views

SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Systemic Weaknesses

Wi-Fi Channel State Information CSI has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational feasibility. However, the field lacks a consolidated understanding of its security properties, adversarial resilience, and methodological consistency. This...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/20 12:0 a.m.2 views

Evaluating LLM Generated Detection Rules in Cybersecurity

LLMs are increasingly pervasive in the security environment, with limited measures of their effectiveness, which limits trust and usefulness to security practitioners. Here, we present an open-source evaluation framework and benchmark metrics for evaluating LLM-generated cybersecurity rules. The...

6.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/19 12:0 a.m.3 views

MalEval Android Malware Evaluation Framework

This repository contains the source code of MalEval, an evaluation framework for Android malware behavior auditing, focusing on explaining and substantiating malicious behaviors. The framework provides expert-verified reports, curated metadata, and model outputs to enable reproducible evaluation ...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/09/17 12:0 a.m.2 views

A Survey and Evaluation Framework for Secure DNS Resolution

Since security was not among the original design goals of the Domain Name System herein called Vanilla DNS, many secure DNS schemes have been proposed to enhance the security and privacy of the DNS resolution process. Some proposed schemes aim to replace the existing DNS infrastructure entirely,...

6.6AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/17 12:0 a.m.1 views

MAD-Spear: a Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems

Multi-agent debate MAD systems leverage collaborative interactions among large language models LLMs agents to improve reasoning capabilities. While recent studies have focused on increasing the accuracy and scalability of MAD systems, their security vulnerabilities have received limited attention...

7.3AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/14 12:0 a.m.2 views

Vulnerability Mitigation System (VMS): LLM Agent and Evaluation Framework for Autonomous Penetration Testing

As the frequency of cyber threats increases, conventional penetration testing is failing to capture the entirety of todays complex environments. To solve this problem, we propose the Vulnerability Mitigation System VMS, a novel agent based on a Large Language Model LLM capable of performing...

7AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/07/05 12:0 a.m.2 views

Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG

Malware Family Classification MFC aims to identify the fine-grained family e.g., GuLoader or BitRAT to which a potential malware sample belongs, in contrast to malware detection or sample classification that predicts only an Yes/No. Accurate family identification can greatly facilitate automated...

6.8AI score
Exploits0
Packet Storm News
Packet Storm News
added 2025/06/23 12:0 a.m.2 views

DUMB and DUMBer: Is Adversarial Training Worth It in the Real World?

Adversarial examples are small and often imperceptible perturbations crafted to fool machine learning models. These attacks seriously threaten the reliability of deep neural networks, especially in security-sensitive domains. Evasion attacks, a form of adversarial attack where input is modified a...

6.9AI score
Exploits0
Rows per page
Query Builder