434 matches found
Turn specs into evals for any agent with ASSERT
Today, we’re releasing Adaptive Spec-driven Scoring for Evaluation and Regression Testing ASSERT, an open-source framework for turning natural-language behavior specifications into executable evaluations. Every team building an AI system starts with a clear intention for the behaviors they want t...
Smart_Contract_Researcher_POC
Smart Contract Security Research Portfolio hailthelord...
CRESS: Quantifying Vulnerabilities of Attack Scenarios in Hardware Reverse Engineering
The safety, security, and reliability of microelectronic systems depend on a trustworthy, secured supply chain and design flow. Globally distributed supply chains or unintentional design weaknesses leave the door open for attacks on the hardware level. These scenarios encompass counterfeiting,...
A New Framework for Cybersecurity Refusals in AI Agents
Agentic scaffolds have dramatically improved LLM performance on complex, long-horizon tasks, yielding both broad benefits and amplified risks in domains like cybersecurity. Existing benchmarks for AI agents in cybersecurity focus mainly on measuring proficiency--how effectively agents can complet...
NICE: A Framework for Declarative and Machine-Checkable Vulnerability Reproduction
Reproducing software vulnerabilities is fundamental to security researchers, open-source maintainers, and educators. Yet, vulnerabilities remain hard to reproduce today, and even when they can be reproduced, recreating a software environment where the vulnerability can be exploited becomes harder...
SAMD: A Tool for Identifying False Data Injection Scenarios in AI/ML-Enabled Medical Devices
The growing integration of artificial intelligence AI and machine learning ML in medical systems requires effective measures to address emerging security risks. One such risk is that of adversaries introducing false data through vulnerable system components during inference, causing misdiagnosis...
CyberMaskQA: A Privacy-Aware Benchmark for Evaluating Large Language Models in Cybersecurity Question Answering
Large language models LLMs are increasingly applied to cybersecurity question answering QA for critical tasks such as incident response and vulnerability analysis. However, real-world operational contexts, including system logs and network configurations, inherently contain sensitive identifiers,...
Astra Linux - уязвимость в linux-5.10, linux, linux-5.15
In the Linux kernel, the following vulnerability has been resolved: can: j1939: fix errant WARNONONCE in j1939SESSIONdeactivate The statement “j1939SESSIONdeactivate should be called with a session ref-count of at least 2” is incorrect. In some concurrent scenarios, j1939SESSIONdeactivate can be...
CVE-2026-21789 HCL Connections is vulnerable to broken access control
HCL Connections contains a broken access control vulnerability that may allow unauthorized user to update data in certain scenarios...
EUVD-2026-30798
HCL Connections contains a broken access control vulnerability that may allow unauthorized user to update data in certain scenarios...
Exploiting LLM Agent Supply Chains Via Payload-Less Skills
Autonomous agents powered by Large Language Models LLMs acquire external functionalities through third-party skills available in open marketplaces. Adopting these integrations broadens the potential attack surface, prompting a need for systematic security evaluation. Current auditing mechanisms a...
OpenSSH: OpenSSH: Security bypass via mishandling of authorized_keys principals option
A flaw was found in OpenSSH. This vulnerability arises from the incorrect handling of the authorizedkeys principals option in uncommon scenarios. Specifically, when a principals list is used with a Certificate Authority that includes comma characters, OpenSSH may misinterpret the input. This coul...
Guaranteed Jailbreaking Defense Via Disrupt-And-Rectify Smoothing
This paper proposes a guaranteed defense method for large language models LLMs to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel smoothing-based defense method, termed Disrupt-and-Rectify...
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
We define Oracle Poisoning, an attack class in which an adversary corrupts a structured knowledge graph that AI agents query at runtime via tool-use protocols, causing incorrect conclusions through correct reasoning. Unlike prompt injection, Oracle Poisoning manipulates the data agents reason ove...
Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
Large Language Models LLMs have revolutionized how information are collected, aggregated, and reasoned. However, this enables a novel and accessible vector of privacy intrusion: the automated and in-depth personal profiling; this engenders a chilling effect of "peepers everywhere". Existing...
PT-2026-36667
CVE-2026-30412 SentinelCloud, AI-Driven Autonomous DevOps Engineer One closed loop. Five agents. Seven scenarios. Zero hallucinated kubectl. Live demo https://t.co/ocEWNzLf9Z...
autopoc
AutoPoC Automated proof-of-concept deployments on OpenShift...
CVE-2026-5832 atototo api-lab-mcp HTTP http-server.ts test_http_endpoint server-side request forgery
A weakness has been identified in atototo api-lab-mcp up to 0.2.1. This affects the function analyzeapispec/generatetestscenarios/testhttpendpoint of the file src/mcp/http-server.ts of the component HTTP Interface. This manipulation of the argument source/url causes server-side request forgery. T...
Towards Resilient Intrusion Detection in CubeSats: Challenges, TinyML Solutions, and Future Directions
CubeSats have revolutionized access to space by providing affordable and accessible platforms for research and education. However, their reliance on Commercial Off-The-Shelf COTS components and open-source software has introduced significant cybersecurity vulnerabilities. Ensuring the cybersecuri...
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose...