5 matches found
A Systematic Study of LLM-Based Architectures for Automated Patching
Large language models LLMs have shown promise for automated patching, but their effectiveness depends strongly on how they are integrated into patching systems. While prior work explores prompting strategies and individual agent designs, the field lacks a systematic comparison of patching...
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Vibe coding is a new programming paradigm in which human engineers instruct large language model LLM agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propo...
RedCodeAgent: Automatic Red-Teaming Agent against Diverse Code Agents
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also...
Breaking the Code: Security Assessment of AI Code Agents through Systematic Jailbreaking Attacks
Code-capable large language model LLM agents are increasingly embedded into software engineering workflows where they can read, write, and execute code, raising the stakes of safety-bypass "jailbreak" attacks beyond text-only settings. Prior evaluations emphasize refusal or harmful-text detection...
SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios
Large language model LLM powered code agents are rapidly transforming software engineering by automating tasks such as testing, debugging, and repairing, yet the security risks of their generated code have become a critical concern. Existing benchmarks have offered valuable insights but remain...