82 matches found
R+R: Reassessing Java Security API Misuse in Current LLMs: A Replication on JCA and JSSE APIs with External Security Knowledge
The misuse of Java security APIs is a serious security problem in software development. Research in 2024 has shown that this problem is widespread in LLM-generated code. However, it remains unclear whether this phenomenon persists in current models and how external security knowledge affects it...
An Empirical Evaluation of LLM-Generated Code Security across Prompting Methods
The growing use of Large Language Models LLMs for automated code generation has enhanced software development efficiency, but often at the cost of security. Generated code frequently overlooks critical concerns, leaving it vulnerable to issues such as weak encryption and improper input validation...
Threat Modelling Using Domain-Adapted Language Models: Empirical Evaluation and Insights
Large Language ModelsLLMs are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior work has primarily evaluated a number of general-purpose Large Language Models under limited prompting settings. In this study, we extend th...
On Fixing Insecure AI-Generated Code through Model Fine-Tuning and Prompting Strategies
The security of AI-generated code remains a major obstacle to its widespread adoption. Although code generation models achieve strong performance on functional benchmarks, their outputs frequently contain bugs and security weaknesses that undermine their trustworthiness. Prior work has explored a...
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
Large language models LLMs are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn InjectionTTI, a new multi-turn attack technique that systematically exploits stateless moderation by distributing...
EUVD-2026-21456
OpenClaw before 2026.3.22 contains an identity spoofing vulnerability in ACP permission resolution that trusts conflicting tool identity hints from rawInput and metadata. Attackers can spoof tool identities through rawInput parameters to suppress dangerous-tool prompting and bypass security...
CVE-2026-35655 OpenClaw < 2026.3.22 - Identity Spoofing via rawInput Tool in ACP Permission Resolution
OpenClaw before 2026.3.22 contains an identity spoofing vulnerability in ACP permission resolution that trusts conflicting tool identity hints from rawInput and metadata. Attackers can spoof tool identities through rawInput parameters to suppress dangerous-tool prompting and bypass security...
CVE-2026-35655
OpenClaw before 2026.3.22 contains an identity spoofing vulnerability in ACP permission resolution that trusts conflicting tool identity hints from rawInput and metadata. Attackers can spoof tool identities through rawInput parameters to suppress dangerous-tool prompting and bypass security...
PT-2026-31562
Open AI models detect vulnerabilities by analyzing code, configs, and system behavior step-by-step—like spotting the FreeBSD zero-day CVE-2024-47467 in the Mythos showcase. Even a 3B open model nailed it, matching bigger ones. The chart shows rankings flip across tasks: OWASP easy stuff vs. gnarl...
SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement
LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious instructions into skills, making them easily detectable b...
GHSA-74WF-H43J-VVMJ OpenClaw's Conflicting Tool Identity Hints Bypass Dangerous-Tool Prompting
Summary ACP permission resolution trusted conflicting tool identity hints from rawInput and metadata, which could suppress dangerous-tool prompting. Affected Packages / Versions - Package: openclaw npm - Affected: = 2026.3.22 - Latest released tag checked: v2026.3.23-2...
Towards Leveraging LLMs to Generate Abstract Penetration Test Cases from Software Architecture
Software architecture models capture early design decisions that strongly influence system quality attributes, including security. However, architecture-level security assessment and feedback are often absent in practice, allowing security weaknesses to propagate into later phases of the software...
AI in Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes
Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition designs and evaluation practices that remain fair and evidence-based. This paper presents a...
ChainFuzzer: Greybox Fuzzing for Workflow-Level Multi-Tool Vulnerabilities in LLM Agents
Tool-augmented LLM agents increasingly rely on multi-step, multi-tool workflows to complete real tasks. This design expands the attack surface, because data produced by one tool can be persisted and later reused as input to another tool, enabling exploitable source-to-sink dataflows that only...
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework for jailbreaks by treating each attack as a...
On Moltbook
The MIT Technology Review has a good article on Moltbook, the supposed AI-only social network: Many people have pointed out that a lot of the viral comments were in fact posted by people posing as bots. But even the bot-written posts are ultimately the result of people pulling the strings, more...
Improper Neutralization of Input Used for LLM Prompting
Overview @n8n/n8n-nodes-langchain is a Affected versions of this package are vulnerable to Improper Neutralization of Input Used for LLM Prompting via the Guardrail node. An attacker can modify workflow input to circumvent intended restrictions by crafting specific input values. Workaround This...
Understanding Human-AI Collaboration in Cybersecurity Competitions
Capture-the-Flag CTF competitions are increasingly becoming a testbed for evaluating AI capabilities at solving security tasks, due to the controlled environments and objective success criteria. Existing evaluations have focused on how successful AI is at solving CTF challenges in isolation from...
Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection Vs. Fine-Grained Indicator Identification
The prevalence of malicious packages in open-source repositories, such as PyPI, poses a critical threat to the software supply chain. While Large Language Models LLMs have emerged as a promising tool for automated security tasks, their effectiveness in detecting malicious packages and indicators...
Benchmarking Large Language Models for Zero-Shot and Few-Shot Phishing URL Detection
The Uniform Resource Locator URL, introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS...