659 matches found
ContraFix: Agentic Vulnerability Repair Via Differential Runtime Evidence and Skill Reuse
Large language model LLM agents are increasingly used for automated vulnerability repair AVR, where repository-level reasoning enables them to inspect context and produce source-code patches. However, recent empirical results show that these agents still struggle with real-world vulnerabilities...
ADR: An Agentic Detection System for Enterprise Agentic AI Security
We present the Agentic AI Detection and Response ADR system, the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol MCP. We identify three persistent challenges in this domain: 1 limited observability -- existing Endpoint...
How AI Hallucinations Are Creating Real Security Risks
AI hallucinations are introducing serious security risks into critical infrastructure decision-making by exploiting human trust through highly confident yet incorrect outputs. When an AI model lacks certainty, it doesn’t have a mechanism to recognize that. Instead, it generates the most probable...
Do Coding Agents Understand Least-Privilege Authorization?
As coding agents gain access to shells, repositories, and user files, least-privilege authorization becomes a prerequisite for safe deployment: an agent should receive enough authority to complete the task, without unnecessary authority that exposes sensitive surfaces.To study whether current...
DCVD: Dual-Channel Cross-Modal Fusion for Joint Vulnerability Detection and Localization
Software vulnerability detection plays a critical role in ensuring system security, where real-world auditing requires not only determining whether a function is vulnerable but also pinpointing the specific lines responsible. However, existing approaches either rely on a single information source...
Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark
In this article 1. AI-powered vulnerability discovery at hyper-scale 2. Codename: MDASH—Microsoft Security’s new multi-model agentic scanning harness 3. Using codename MDASH for security research 4. The 5.12.2026 Patch Tuesday cohort 5. Two deep dives 1. CVE-2026-33827—Remote unauthenticated UAF ...
b2aiprep (>=0.19.0 <=3.3.2), capstone-text-mining (>=0.0.6 <=0.1.2) +10 more potentially affected by CVE-2026-31223 via snorkel (>=0.10.0 <=0.9.9)
snorkel PYPI version =0.10.0, =0.19.0, =0.0.6, =1.0.2, =0.8.0, =0.1.1, =0.1.2, =0.1.0, =0.6.1, =0.0.0, =1.3.1a1 - t2r2 =0.0.1 - ws-benchmark =1.1.2rc0 Source cves: CVE-2026-31223 Source advisory: SNYK:PYTHON-SNORKEL-16758051...
b2aiprep (>=0.19.0 <=3.3.2), capstone-text-mining (>=0.0.6 <=0.1.2) +10 more potentially affected by CVE-2026-31222 via snorkel (>=0.10.0 <=0.9.9)
snorkel PYPI version =0.10.0, =0.19.0, =0.0.6, =1.0.2, =0.8.0, =0.1.1, =0.1.2, =0.1.0, =0.6.1, =0.0.0, =1.3.1a1 - t2r2 =0.0.1 - ws-benchmark =1.1.2rc0 Source cves: CVE-2026-31222 Source advisory: SNYK:PYTHON-SNORKEL-16758049...
b2aiprep (>=0.19.0 <=3.3.2), capstone-text-mining (>=0.0.6 <=0.1.2) +10 more potentially affected by CVE-2026-31224 via snorkel (>=0.10.0 <=0.9.9)
snorkel PYPI version =0.10.0, =0.19.0, =0.0.6, =1.0.2, =0.8.0, =0.1.1, =0.1.2, =0.1.0, =0.6.1, =0.0.0, =1.3.1a1 - t2r2 =0.0.1 - ws-benchmark =1.1.2rc0 Source cves: CVE-2026-31224 Source advisory: SNYK:PYTHON-SNORKEL-16758048...
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue tha...
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to files, tools, memory, and execution environments. However, this modularity introduces attack surfaces that are largely missed by existing safety evaluations: eve...
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
The rapid proliferation of LLM-based autonomous agents in real operating system environments introduces a new category of safety risk beyond content safety: behavior jailbreak, where an adversary induces an agent to execute dangerous OS-level operations with irreversible consequences. Existing...
MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring
We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current red-teaming. First, mode collapse in attack generation,...
ai.timefold.solver:timefold-solver-quarkus-benchmark-deployment (>=0.8.38 <=0.9.38), ai.timefold.solver:timefold-solver-quarkus-benchmark-integration-test (>=0.8.38 <=0.9.38) +3086 more potentially affected by CVE-2026-6860 via io.vertx:vertx-core (>=4.3.4 <=4.3.8)
io.vertx:vertx-core MAVEN version =4.3.4, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =0.8.38, =22.9.0, =22.9.0, =22.9.0, =22.9.0, =22.9.5 and more Source cves: CVE-2026-6860 Source advisory: OSV:GHSA-3G76-F9XQ-8VP6https://vulners.com/osv/OSV...
Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise IoCs such as IP addresses within source code. While Large Language Models LLMs have recently demonstrated remarkable...
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution
LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitations, this research introduces LCC-LL...
Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis
Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While...
ARGUS: Defending LLM Agents against Context-Aware Prompt Injection
The rise of Large Language Model LLM agents, augmented with tool use, skills, and external knowledge, has introduced new security risks. Among them, prompt injection attacks, where adversaries embed malicious instructions into the agent workflow, have emerged as the primary threat. However,...
ai.timefold.solver:timefold-solver-quarkus-benchmark-integration-test (>=0.9.38 <=1.20.1), ai.timefold.solver:timefold-solver-quarkus-devui-integration-test (>=0.9.38 <=1.20.1) +1589 more potentially affected by CVE-2026-39852 via io.quarkus:quarkus-vertx-http (>=3.0.0.Alpha1 <=3.20.6)
io.quarkus:quarkus-vertx-http MAVEN version =3.0.0.Alpha1, =0.9.38, =0.9.38, =0.9.38, =0.9.38, =0.9.38, =0.9.38, =0.0.1, =0.0.1, =0.0.1, =0.0.4, =0.0.4, =0.0.4, =0.0.4, =0.0.2, =0.0.1, =0.0.5 and more Source cves: CVE-2026-39852 Source advisory: SNYK:JAVA-IOQUARKUS-16420254...
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns
The increasing prevalence of software vulnerabilities highlights the need for effective Automatic Vulnerability Repair AVR tools. While LLM-based approaches are promising, they struggle to incorporate structured security knowledge from sources like CWE and NVD. Current methods either use this...