@Jaward on Hugging Face: "minimal single script implementation of knowledge distillation in LLMs. In…"

Post

1057

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

Join the conversation