Sharing transformers_distillation, a library for efficient model distillation in Hugging Face Transformers, with ready-to-use DistillationTrainer and examples for causal, MLM, and seq2seq models

dignity045 · August 14, 2025, 1:43pm

Transformers Distillation Library

transformers_distillation is a Python library built on top of Hugging Face Transformers that enables efficient model distillation. It provides tools to train smaller, faster student models while retaining the performance of large teacher models. The library is designed to work with causal language models, masked language models, and sequence-to-sequence models.

Key Features

Easy Teacher-Student Setup: Quickly load teacher and student models using load_teacher and load_student.
Distillation Training: Train your student model using DistillationTrainer, a drop-in replacement for Hugging Face Trainer.
Task Detection: Automatically detects task type (causal, masked, or seq2seq) using detect_task_type.
Flexible Configuration: Supports standard training arguments and integrates seamlessly with the Transformers ecosystem.
Example Notebooks: Ready-to-use examples for CausalLM, MLM, and Seq2Seq distillation.

Installation

pip install --no-deps git+https://github.com/Dhiraj309/transformers_distillation.git

Basic Usage

from transformers_distillation.models import load_teacher, load_student
from transformers_distillation import DistillationTrainer

# Load teacher and student models
teacher = load_teacher("bert-base-uncased")
student = load_student("bert-small-uncased")

# Prepare datasets (using Hugging Face datasets)
from datasets import load_dataset
dataset = load_dataset("glue", "sst2")

# Initialize trainer
trainer = DistillationTrainer(
    teacher_model=teacher,
    student_model=student,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"]
)

# Train the student model
trainer.train()

Why Use It

Reduce model size and inference latency.
Retain high accuracy with knowledge distillation.
Fully compatible with Hugging Face Transformers ecosystem.
Ideal for deploying models to resource-constrained environments.

Topic		Replies	Views
Knowledge distillation for NER task 🤗Transformers	0	299	August 23, 2023
Regarding Training a Task Specific Knowledge Distillation model 🤗Transformers	8	3480	September 2, 2023
T5/mT5 model distillation 🤗Transformers	1	1032	December 25, 2023
Distilbert customize model 🤗Transformers	0	221	July 24, 2022
Distilgpt2 model Beginners	0	63	August 1, 2024