10 matches found
Learn from Your Mistakes: Tree-Like Self-Play for Secure Code LLMs
While Large Language Models LLMs excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning SFT and Reinforcement Learning RL, typically apply coarse-grained optimizati...
DeepGuard Secure Code Generation
Large Language Models LLMs for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer...
SecPI: Secure Code Generation with Reasoning Models Via Security Reasoning Internalization
Reasoning language models RLMs are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code. Prior training-based approaches for secure code generation face a critical limitation that prevents their direct applicati...
SecCodePRM: A Process Reward Model for Code Security
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level...
Can Developers Rely on LLMs for Secure IaC Development?
We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code IaC development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompt...
RESCUE: Retrieval Augmented Secure Code Generation
Despite recent advances, Large Language Models LLMs still generate vulnerable code. Retrieval-Augmented Generation RAG has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw...
SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios
Large language model LLM powered code agents are rapidly transforming software engineering by automating tasks such as testing, debugging, and repairing, yet the security risks of their generated code have become a critical concern. Existing benchmarks have offered valuable insights but remain...
A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs
Code-generating Large Language Models LLMs significantly accelerate software development. However, their frequent generation of insecure code presents serious risks. We present a comprehensive evaluation of seven parameter-efficient fine-tuning PEFT techniques, demonstrating substantial gains in...
A.S.E: a Repository-Level Benchmark for Evaluating Security in AI-Generated Code
The increasing adoption of large language models LLMs in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks are inadequate, as they focus on isolated code snippets, employ unstable evaluation methods that lack reproducibility, and...
SecRepoBench: Benchmarking LLMs for Secure Code Generation in Real-World Repositories
This paper introduces SecRepoBench, a benchmark to evaluate LLMs on secure code generation in real-world repositories. SecRepoBench has 318 code generation tasks in 27 C/C++ repositories, covering 15 CWEs. We evaluate 19 state-of-the-art LLMs using our benchmark and find that the models struggle...