2 matches found
Learn from Your Mistakes: Tree-Like Self-Play for Secure Code LLMs
While Large Language Models LLMs excel in code generation, they remain prone to replicating subtle yet critical vulnerabilities endemic to their training data. Current alignment techniques, such as Supervised Fine-Tuning SFT and Reinforcement Learning RL, typically apply coarse-grained optimizati...
VULPO: Context-Aware Vulnerability Detection Via On-Policy LLM Optimization
The widespread reliance on open-source software dramatically increases the risk of vulnerability exploitation, underscoring the need for effective and scalable vulnerability detection VD. Existing VD techniques, whether traditional machine learning-based or LLM-based approaches like prompt...