5 matches found
PT-2026-41417
Claude Mythos Preview case studies also, read your transcripts! https://t.co/drNlAH5mLE "Mythos demonstrates its bug reproduction and exploitation capabilities on CVE-2024-051912, an in-the-wild exploited bug that has no public report nor a working PoC whatsoever in the public domain. This bug ha...
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a score without performing the intended task, emerges spontaneously in frontier models without overfitting. We argue tha...
Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
The growing misuse of Vision-Language Models VLMs has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and content moderation. However, the real-world robustness of these defenses against adversarial attacks remains underexplored. We introduce Multi-Faceted...
Subverting AIOps Systems Through Poisoned Input Data
In this input integrity attack against an AI system, researchers were able to fool AIOps tools: AIOps refers to the use of LLM-based agents to gather and analyze application telemetry, including system logs, performance metrics, traces, and alerts, to detect problems and then suggest or carry out...
When AIs Start Hacking
If you dont have enough to worry about already, consider a world where AIs are hackers. Hacking is as old as humanity. We are creative problem solvers. We exploit loopholes, manipulate systems, and strive for more influence, power, and wealth. To date, hacking has exclusively been a human activit...