Unlearning Isn'T Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Unlearning in large language models LLMs is intended to remove the influence of specific data, yet current evaluations rely heavily on token-level metrics such as accuracy and perplexity. We show that these metrics can be misleading: models often appear to forget, but their original behavior can ...