Sharing transformers_distillation, a library for efficient model distillation in Hugging Face Transformers, with ready-to-use DistillationTrainer and examples for causal, MLM, and seq2seq models

Transformers Distillation Library

transformers_distillation is a Python library built on top of Hugging Face Transformers that enables efficient model distillation. It provides tools to train smaller, faster student models while retaining the performance of large teacher models. The library is designed to work with causal language models, masked language models, and sequence-to-sequence models.

Key Features

  • Easy Teacher-Student Setup: Quickly load teacher and student models using load_teacher and load_student.

  • Distillation Training: Train your student model using DistillationTrainer, a drop-in replacement for Hugging Face Trainer.

  • Task Detection: Automatically detects task type (causal, masked, or seq2seq) using detect_task_type.

  • Flexible Configuration: Supports standard training arguments and integrates seamlessly with the Transformers ecosystem.

  • Example Notebooks: Ready-to-use examples for CausalLM, MLM, and Seq2Seq distillation.

Installation

pip install --no-deps git+https://github.com/Dhiraj309/transformers_distillation.git

Basic Usage

from transformers_distillation.models import load_teacher, load_student
from transformers_distillation import DistillationTrainer

# Load teacher and student models
teacher = load_teacher("bert-base-uncased")
student = load_student("bert-small-uncased")

# Prepare datasets (using Hugging Face datasets)
from datasets import load_dataset
dataset = load_dataset("glue", "sst2")

# Initialize trainer
trainer = DistillationTrainer(
    teacher_model=teacher,
    student_model=student,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"]
)

# Train the student model
trainer.train()

Why Use It

  • Reduce model size and inference latency.

  • Retain high accuracy with knowledge distillation.

  • Fully compatible with Hugging Face Transformers ecosystem.

  • Ideal for deploying models to resource-constrained environments.

1 Like