8 matches found
Towards Cybersecurity SuperIntelligence (CSI): What'S the Best Harness for Cybersecurity?
What is the best harness for cybersecurity AI? Cybersecurity systems are converging on a single execution scaffold per agent, an iterative shell loop driven by a Large Language Model LLM. However, scaffolds are not interchangeable, rarely interoperable, and no single scaffold dominates across all...
How AI Hallucinations Are Creating Real Security Risks
AI hallucinations are introducing serious security risks into critical infrastructure decision-making by exploiting human trust through highly confident yet incorrect outputs. When an AI model lacks certainty, it doesn’t have a mechanism to recognize that. Instead, it generates the most probable...
Evaluating Jailbreaking Vulnerabilities in LLMs Deployed As Assistants for Smart Grid Operations: A Benchmark against NERC Standards
The deployment of Large Language Models LLMs as assistants in electric grid operations promises to streamline compliance and decision-making but exposes new vulnerabilities to prompt-based adversarial attacks. This paper evaluates the risk of jailbreaking LLMs, i.e., circumventing safety alignmen...
Owner-Harm: A Missing Threat Model for AI Agent Safety
Existing AI agent safety benchmarks focus on generic criminal harm cybercrime, harassment, weapon synthesis, leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-world incidents illustrate the gap: Slack AI...
Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification
Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection i.e., distinguishing malicious...
Clustering Malware at Scale: A First Full-Benchmark Study
Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware...
JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
Deobfuscating JavaScript JS code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models LLMs have recently shown promise in automating the deobfuscation process,...
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding
Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, reverse engineers face significant challenges in understandi...