7 matches found
A New Framework for Cybersecurity Refusals in AI Agents
Agentic scaffolds have dramatically improved LLM performance on complex, long-horizon tasks, yielding both broad benefits and amplified risks in domains like cybersecurity. Existing benchmarks for AI agents in cybersecurity focus mainly on measuring proficiency--how effectively agents can complet...
OpenAI’s GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities
The UK's AI Security Institute evaluated GPT-5.5's ability to find security vulnerabilities, and found that it is comparable to Claude Mythos. Note that the OpenAI model is generally available. Here is the Institute's evaluation of Mythos. And here is an analysis of a smaller, cheaper model. It...
Jailbreaking Frontier Foundation Models through Intention Deception
Large vision-language models exhibit remarkable capability but remain highly susceptible to jailbreaking. Existing safety training approaches aim to have the model learn a refusal boundary between safe and unsafe, based on the user's intent. It has been found that this binary training regime ofte...
OpenAI Launches GPT-5.4-Cyber to Boost Defensive Cybersecurity
OpenAI unveils GPT-5.4-Cyber, a cybersecurity-focused model built to help defenders analyze malware and fix software bugs. The company is also expanding its Trusted Access for Cyber TAC program to thousands of verified experts...
AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises
Today's leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and th...
GPT-5 at CTFs: Case Studies from Top-Tier Cybersecurity Events
OpenAI and DeepMind's AIs recently got gold at the IMO math olympiad and ICPC programming competition. We show frontier AI is similarly good at hacking by letting GPT-5 compete in elite CTF cybersecurity competitions. In one of this year's hardest events, it outperformed 93% of humans finishing...
AI Pulse: OpenAI’s Wild Bot Behavior After GPT-5
The AI Pulse series breaks down traffic trends and what they mean for apps, APIs, and businesses. In this post, read how OpenAI’s bots are changing after GPT-5...