Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection
Large language models LLMs can detect software vulnerabilities, but how do they actually identify vulnerable code? We address this question using mechanistic interpretability; analyzing the internal computations of a neural network to understand its reasoning process.Using Circuit Tracer on...