Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation
Large Language Models LLMs show promise for Automated Program Repair APR, yet their effectiveness on security vulnerabilities remains poorly characterized. This study analyzes 319 LLM-generated security patchesacross 64 Java vulnerabilities from the Vul4J benchmark. Using tri-axis evaluation...