91 matches found
On the Evaluation of Spiking Neural Network Configurations for Network Intrusion Detection
Network intrusion detection is a core component of modern cybersecurity infrastructure, yet the deep learning models that dominate the field are computationally demanding, motivating interest in lightweight alternatives suited to edge and neuromorphic deployment. Spiking Neural Networks SNNs are...
How to Compare the Security of Code Written by Humans to LLM-Generated Code
Large language models LLMs are rapidly transforming how software is created and maintained. Comparing LLM-generated code against human-written standards is essential to determine whether these new tools uphold or erode the security baselines established by professional developers. Yet, we lack a...
Unlocking Apple's Private Cloud Compute: An Analysis of Privacy-Preserving Artificial Intelligence
Many existing Artificial Intelligence AI solutions on mobile devices rely on an extensive collection of sensitive data, raising privacy concerns and often requiring storage for both context and model improvement. Apple's Private Cloud Compute PCC aims to address this by emphasizing mobile device...
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
The integration of Large Language Models LLMs into Electronic Design Automation EDA and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unprecedented capabilities in generating Register Transfer Level RTL code, automating testbenches, and bridging the semantic...
MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
Multi-turn jailbreaks exploit the ability of large language models to accumulate and act on conversational context. Instead of stating a harmful request directly, an attacker can gradually steer the conversation toward an unsafe answer. Recent methods demonstrate this risk, but they are usually...
Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents
Autonomous Large Language Model LLM agents are increasingly deployed to conduct complex tasks by interacting with external tools, APIs, and memory stores. However, processing untrusted external data exposes these agents to severe security threats, such as indirect prompt injection and unauthorize...
Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges
Large Language Model LLM agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag CTF challenges ...
HarmChip: Evaluating Hardware Security Centric LLM Safety Via Jailbreak Benchmarking
The integration of large language models LLMs into electronic design automation EDA workflows has introduced powerful capabilities for RTL generation, verification, and design optimization, but also raises critical security concerns. Malicious LLM outputs in this domain pose hardware-level threat...
n-days-poc-benchmark-and-dataset
ICS N-Day Vulnerability PoC Benchmark Suite A structured coll...
From transparency to action: What the latest Microsoft email security benchmark reveals
In our last benchmarking post, Clarity in complexity: New insights for transparent email security ,1 we shared why transparency matters more than ever in email security and how clear, consistent benchmarking helps security teams cut through noise and make confident decisions. Today, we’re...
PQC-LEO: An Evaluation Framework for Post-Quantum Cryptographic Algorithms
Advances in quantum computing threaten digital communication security by undermining the foundations of current public-key cryptography through Shor's quantum algorithm. This has driven the development of Post-Quantum Cryptography PQC, a new set of algorithms resistant to quantum attacks. While...
Quantifying Frontier LLM Capabilities for Container Sandbox Escape
Large language models LLMs increasingly act as autonomous agents, using tools to execute code, read and write files, and access networks, creating novel security risks. To mitigate these risks, agents are commonly deployed and evaluated in isolated "sandbox" environments, often implemented using...
Why Intelligent Contract Solutions Are Replacing Traditional CLM Systems
Intelligent contract solutions replace traditional CLM by adding AI analysis, benchmarking, and risk insights that speed reviews, reduce delays, and improve decisions...
Introducing AI Cyber Model Arena: A Real-World Benchmark for AI Agents in Cybersecurity
Wiz Research’s AI Cyber Model Arena benchmarks offensive AI security on 257 real-world challenges zero-days, CVEs, API/web, and cloud across AWS/Azure/GCP/K8s demonstrating what AI models and agents can really do...
Next-Generation Cyberattack Detection with Large Language Models: Anomaly Analysis across Heterogeneous Logs
This project explores large language models LLMs for anomaly detection across heterogeneous log sources. Traditional intrusion detection systems suffer from high false positive rates, semantic blindness, and data scarcity, as logs are inherently sensitive, making clean datasets rare. We address...
AI-Driven Cybersecurity Threats: A Survey of Emerging Risks and Defensive Strategies
Artificial Intelligence's dual-use nature is revolutionizing the cybersecurity landscape, introducing new threats across four main categories: deepfakes and synthetic media, adversarial AI attacks, automated malware, and AI-powered social engineering. This paper aims to analyze emerging risks,...
AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation
The National Institute of Standards and Technology NIST Computer Forensic Tool Testing CFTT programme has become the de facto standard for providing digital forensic tool testing and validation. However to date, no comprehensive framework exists to automate benchmarking across the diverse forensi...
Clarity in complexity: New insights for transparent email security
As email threats grow more sophisticated and layered security architectures become more common, organizations need clear, data-driven insights to evaluate how their security solutions perform together. Benchmarking plays a critical role in helping security leaders understand not just individual...
Clarity in complexity: New insights for transparent email security
As email threats grow more sophisticated and layered security architectures become more common, organizations need clear, data-driven insights to evaluate how their security solutions perform together. Benchmarking plays a critical role in helping security leaders understand not just individual...
Constructing and Benchmarking: A Labeled Email Dataset for Text-Based Phishing and Spam Detection Framework
Phishing and spam emails remain a major cybersecurity threat, with attackers increasingly leveraging Large Language Models LLMs to craft highly deceptive content. This study presents a comprehensive email dataset containing phishing, spam, and legitimate messages, explicitly distinguishing betwee...