Who’s Harry Potter? Approximate unlearning in LLMs
With one GPU-hour of fine-tuning, Eldan and Russinovich (2023) made Llama-2-7B forget Harry Potter without measurable loss of general competence. An interactive walkthrough of the mechanism: edge surgery on the model’s knowledge graph. Runs…