17 matches found
Can Open-Source LLM Agents Replace Static Application Security Testing Tools? an Empirical Assessment
This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- GenAI- based agent when powered by three different Ollama-hosted general-purpose open source models. We assess each agent's performance using...
Benchmarking-Agent-Architectures
Benchmarking Agent Architectures for LLM-Based Exploit Gener...
Jailbreaking LLMs and VLMs: Mechanisms, Evaluation, and Unified Defense
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models LLMs and Vision-Language Models VLMs, emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It...
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Large Language Models LLMs have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence CTI remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could...
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed fo...
From LLMs to MLLMs to Agents: a Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Large language models LLMs are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak...
SoK: Data Reconstruction Attacks against Machine Learning Models: Definition, Metrics, and Benchmark
Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no consensus on a formal definition of data reconstruction attacks or appropriate evaluation metrics for...
Seven Security Challenges That Must Be Solved in Cross-Domain Multi-Agent LLM Systems
Large language models LLMs are rapidly evolving into autonomous agents that cooperate across organizational boundaries, enabling joint disaster response, supply-chain optimization, and other tasks that demand decentralized expertise without surrendering data ownership. Yet, cross-domain...
Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
The rise of Large Language Models LLMs has revolutionized Graphical User Interface GUI automation through LLM-powered GUI agents, yet their ability to process sensitive data with limited human oversight raises significant privacy and security risks. This position paper identifies three key risks ...
Urania: Differentially Private Insights into AI Use
We introduce $Urania$, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy DP guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided...
A Review of Various Datasets for Machine Learning Algorithm-Based Intrusion Detection System: Advances and Challenges
IDS aims to protect computer networks from security threats by detecting, notifying, and taking appropriate action to prevent illegal access and protect confidential information. As the globe becomes increasingly dependent on technology and automated processes, ensuring secured systems,...
System Prompt Extraction Attacks and Defenses in Large Language Models
The system prompt in Large Language Models LLMs plays a pivotal role in guiding model behavior and response generation. Often containing private configuration details, user roles, and operational instructions, the system prompt has become an emerging attack target. Recent studies have shown that...
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Recently, AI-driven interactions with computing devices have advanced from basic prototype tools to sophisticated, LLM-based systems that emulate human-like operations in graphical user interfaces. We are now witnessing the emergence of \emphComputer-Using Agents CUAs, capable of autonomously...
ACSE-Eval: Can LLMs Threat Model Real-World Cloud Infrastructure?
While Large Language Models have shown promise in cybersecurity applications, their effectiveness in identifying security threats within cloud deployments remains unexplored. This paper introduces AWS Cloud Security Engineering Eval, a novel dataset for evaluating LLMs cloud security threat...
A Survey of Learning-Based Intrusion Detection Systems for In-Vehicle Network
Connected and Autonomous Vehicles CAVs enhance mobility but face cybersecurity threats, particularly through the insecure Controller Area Network CAN bus. Cyberattacks can have devastating consequences in connected vehicles, including the loss of control over critical systems, necessitating robus...
Evaluating Explanation Quality in X-IDS Using Feature Alignment Metrics
Explainable artificial intelligence XAI methods have become increasingly important in the context of explainable intrusion detection systems X-IDSs for improving the interpretability and trustworthiness of X-IDSs. However, existing evaluation approaches for XAI focus on model-specific properties...
Towards a Standardized Methodology and Dataset for Evaluating LLM-Based Digital Forensic Timeline Analysis
Large language models LLMs have seen widespread adoption in many domains including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach...