Today I am pleased to open-source the code for implementing the recipes from Knowledge distillation: A good teacher is patient and consistent (function matching) and reproducing their results on three benchmark datasets: Pet37, Flowers102, and Food101.
Importance: The importance of knowledge distillation lies in its practical usefulness. With the recipes from “function matching”, we can now perform knowledge distillation using a principled approach yielding student models that can actually match the performance of their teacher models. This essentially allows us to compress bigger models into (much) smaller ones thereby reducing storage costs and improving inference speed.
Some features of the repository I wanted to highlight:
- The code is provided as Kaggle Kernel Notebooks to allow the usage of free TPU v3-8 hardware. This is important because the training schedules are comparatively longer.
- There’s a notebook on distributed hyperparameter tuning and it’s often not included in the public release of an implementation.
- For reproducibility and convenience, I have provided pre-trained models and TFRecords for all the datasets I used.
Here’s a link to the repository
I’d like to sincerely thank Lucas Beyer (first author of the paper) for providing crucial feedback on the earlier implementations, ML-GDE program for the GCP support, and TRC for providing TPU access. For any questions, either create an issue in the repository directly or email me.
Thank you for reading!