emotion-classifier-dl-genai-project / main_code_explanation.md
HarshalGunjalOp
Add notebooks and kaggle data for GitHub, configure HuggingFace ignore
eccd289

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Deep Learning Emotion Classification - Code Explanation

This document provides a detailed line-by-line explanation of the main.ipynb notebook, which implements a multi-label emotion classification system using the DeBERTa transformer model with K-Fold cross-validation.


Section 1: Imports & Setup

Lines 18-36: Import Statements

import numpy as np
import pandas as pd
  • numpy: Used for numerical operations, array manipulation, and random seed setting
  • pandas: Used for data loading and manipulation (CSV files, DataFrames)
import torch
import torch.nn as nn
  • torch: PyTorch deep learning framework for tensor operations and model training
  • torch.nn: Neural network modules including loss functions
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import f1_score
  • StratifiedKFold: Creates k-fold splits while maintaining class distribution in each fold
  • f1_score: Calculates F1 metric for evaluation (harmonic mean of precision and recall)
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    get_linear_schedule_with_warmup,
    AutoConfig
)
  • AutoTokenizer: Automatically loads the appropriate tokenizer for the specified model
  • AutoModelForSequenceClassification: Pre-trained transformer model for classification tasks
  • get_linear_schedule_with_warmup: Learning rate scheduler with warmup and linear decay
  • AutoConfig: Model configuration loader
from torch.optim import AdamW
  • AdamW: Adam optimizer with decoupled weight decay (better than standard Adam for transformers)
from torch.cuda.amp import autocast, GradScaler
  • autocast: Enables automatic mixed precision (AMP) to speed up training
  • GradScaler: Scales gradients for mixed precision training to prevent underflow
import gc
import warnings
import os
  • gc: Garbage collection to free up memory
  • warnings: To suppress warning messages
  • os: For file system operations and environment variables
warnings.filterwarnings("ignore")
  • Suppresses all warning messages for cleaner output

Section 2: Configuration

Lines 52-68: Configuration Class

class Config:
    SEED = 42
  • Sets random seed for reproducibility across all random operations
    LABELS = ["anger", "fear", "joy", "sadness", "surprise"]
  • Defines the 5 emotion labels for multi-label classification
    MODEL_NAME = "microsoft/deberta-v3-base"
  • Specifies the pre-trained model (DeBERTa v3 base - 184M parameters, SOTA performance)
    MAX_LEN = 128
  • Maximum sequence length for tokenization (tokens longer than this are truncated)
    BATCH_SIZE = 16
  • Number of samples processed together in one forward/backward pass
    EPOCHS = 4
  • Number of complete passes through the training dataset
    LR = 1.5e-5
  • Learning rate (1.5 × 10⁻⁵) - small value typical for fine-tuning transformers
    WEIGHT_DECAY = 0.01
  • L2 regularization strength to prevent overfitting
    WARMUP_RATIO = 0.1
  • Fraction of training steps used for learning rate warmup (10% of total steps)
    N_FOLDS = 5
  • Number of folds for K-Fold cross-validation
    TRAIN_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/train.csv"
    TEST_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/test.csv"
  • Paths to training and test datasets (Kaggle environment paths)
    SUBMISSION_PATH = "submission.csv"
  • Output file for predictions
CONFIG = Config()
  • Creates a global instance of the configuration class

Section 3: Seed & Device Setup

Lines 84-93: Reproducibility and Device Selection

def set_seed(seed=CONFIG.SEED):
    np.random.seed(seed)
  • Sets numpy's random seed for reproducible random number generation
    torch.manual_seed(seed)
  • Sets PyTorch's random seed for CPU operations
    torch.cuda.manual_seed_all(seed)
  • Sets PyTorch's random seed for all GPU devices
    os.environ['PYTHONHASHSEED'] = str(seed)
  • Sets hash seed for Python's built-in hash() function for reproducibility
set_seed()
  • Calls the seed setting function
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
  • Checks if GPU is available; uses GPU if available, otherwise falls back to CPU
  • Prints the device being used for training

Section 4: Utility Functions

Lines 109-115: ensure_text_column Function

def ensure_text_column(df: pd.DataFrame) -> pd.DataFrame:
    if "text" in df.columns:
        return df
  • Checks if DataFrame already has a "text" column; if yes, returns unchanged
    for c in ["comment_text", "sentence", "content", "review"]:
        if c in df.columns:
            return df.rename(columns={c: "text"})
  • Searches for common alternative text column names
  • Renames the first matching column to "text" for standardization
    raise ValueError("No text column found. Add/rename your text column to 'text'.")
  • Raises an error if no text column is found

Lines 117-126: tune_thresholds Function

def tune_thresholds(y_true: np.ndarray, y_prob: np.ndarray) -> np.ndarray:
    th = np.zeros(y_true.shape[1], dtype=np.float32)
  • Creates array to store optimal threshold for each label (initialized to 0)
  • Multi-label classification requires separate thresholds per label
    for j in range(y_true.shape[1]):
        best_t, best_f1 = 0.5, -1
  • Iterates through each label
  • Initializes best threshold to 0.5 (default) and best F1 to -1
        for t in np.linspace(0.1, 0.9, 17):
  • Tests 17 threshold values evenly spaced between 0.1 and 0.9
            f1 = f1_score(y_true[:, j], (y_prob[:, j] >= t).astype(int), zero_division=0)
  • Calculates F1 score for current label and threshold
  • Converts probabilities to binary predictions using threshold
            if f1 > best_f1:
                best_f1, best_t = f1, t
  • Updates best threshold if current F1 is better
        th[j] = best_t
    return th
  • Stores optimal threshold for each label and returns the array

Lines 128-141: get_optimizer_params Function

def get_optimizer_params(model, lr, weight_decay):
    param_optimizer = list(model.named_parameters())
  • Gets all model parameters with their names
    no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
  • Lists parameters that should NOT have weight decay applied
  • Bias and LayerNorm parameters typically trained without weight decay
    optimizer_parameters = [
        {
            "params": [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
            "weight_decay": weight_decay,
        },
  • First parameter group: all parameters EXCEPT bias and LayerNorm
  • These parameters will have weight decay applied
        {
            "params": [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
        },
    ]
  • Second parameter group: only bias and LayerNorm parameters
  • These parameters have weight decay set to 0.0
    return optimizer_parameters
  • Returns grouped parameters for differential weight decay

Section 5: Dataset Class

Lines 157-180: EmotionDS Class

class EmotionDS(torch.utils.data.Dataset):
    def __init__(self, df, tokenizer, max_len, is_test=False):
  • Custom PyTorch Dataset class for emotion classification
  • is_test flag indicates whether this is test data (no labels)
        self.texts = df["text"].tolist()
  • Extracts text data as a Python list
        self.is_test = is_test
        if not is_test:
            self.labels = df[CONFIG.LABELS].values.astype(np.float32)
  • Stores test flag
  • If training data, extracts multi-label targets as float32 array
        self.tok = tokenizer
        self.max_len = max_len
  • Stores tokenizer and max length for later use
    def __len__(self):
        return len(self.texts)
  • Returns dataset size (required by PyTorch)
    def __getitem__(self, i):
        enc = self.tok(
            self.texts[i],
            truncation=True,
            padding="max_length",
            max_length=self.max_len,
            return_tensors="pt",
        )
  • Tokenizes the text at index i
  • truncation: Cuts text longer than max_len
  • padding: Pads shorter sequences to max_len
  • return_tensors="pt": Returns PyTorch tensors
        item = {k: v.squeeze(0) for k, v in enc.items()}
  • Removes the batch dimension (1, seq_len) → (seq_len)
  • Returns dict with keys: input_ids, attention_mask, token_type_ids (if applicable)
        if not self.is_test:
            item["labels"] = torch.tensor(self.labels[i])
        return item
  • Adds labels to the item dict if training data
  • Returns the complete item

Section 6: Training & Validation Helper Functions

Lines 196-213: train_one_epoch Function

def train_one_epoch(model, loader, optimizer, scheduler, scaler, criterion):
    model.train()
  • Sets model to training mode (enables dropout, batch normalization updates)
    losses = []
    for batch in loader:
  • Initializes list to track losses
  • Iterates through batches
        batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
  • Moves batch data to GPU (or CPU)
  • non_blocking=True: Async transfer for faster processing
        optimizer.zero_grad(set_to_none=True)
  • Clears gradients from previous step
  • set_to_none=True: More memory efficient than setting to zero
        with autocast(enabled=True):
            out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
            loss = criterion(out.logits, batch["labels"])
  • autocast: Uses mixed precision (float16) for faster computation
  • Forward pass through model
  • Calculates loss between predictions (logits) and true labels
        scaler.scale(loss).backward()
  • Scales loss to prevent gradient underflow in mixed precision
  • Computes gradients via backpropagation
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
  • Unscales gradients before clipping
  • Clips gradients to maximum norm of 1.0 to prevent exploding gradients
        scaler.step(optimizer)
        scaler.update()
  • Updates model parameters (with scaled gradients)
  • Updates the scaler's internal state
        scheduler.step()
  • Updates learning rate according to schedule
        losses.append(loss.item())
    return np.mean(losses)
  • Stores loss value
  • Returns average loss for the epoch

Lines 215-230: validate Function

def validate(model, loader, criterion):
    model.eval()
  • Sets model to evaluation mode (disables dropout, fixes batch norm)
    losses = []
    preds = []
    targs = []
  • Initializes lists for losses, predictions, and targets
    with torch.no_grad():
  • Disables gradient computation (saves memory and speeds up inference)
        for batch in loader:
            batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
            with autocast(enabled=True):
                out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
                loss = criterion(out.logits, batch["labels"])
  • Moves batch to device
  • Forward pass with mixed precision
  • Calculates validation loss
            losses.append(loss.item())
            preds.append(torch.sigmoid(out.logits).float().cpu().numpy())
            targs.append(batch["labels"].cpu().numpy())
  • Stores loss
  • Applies sigmoid to convert logits to probabilities [0, 1]
  • Moves predictions and targets to CPU as numpy arrays
    return np.mean(losses), np.vstack(preds), np.vstack(targs)
  • Returns average loss, stacked predictions, and stacked targets

Section 7: Main K-Fold Training Loop

Lines 246-324: run_training Function

def run_training():
    if not os.path.exists(CONFIG.TRAIN_CSV):
        print("Train CSV not found. Please check the path.")
        return None, None
  • Checks if training data exists
  • Returns None if not found (graceful failure)
    df = pd.read_csv(CONFIG.TRAIN_CSV)
    df = ensure_text_column(df)
  • Loads training data
  • Ensures text column exists
    skf = StratifiedKFold(n_splits=CONFIG.N_FOLDS, shuffle=True, random_state=CONFIG.SEED)
    y_str = df[CONFIG.LABELS].astype(str).agg("".join, axis=1)
  • Creates 5-fold stratified splitter
  • Converts multi-label to string representation for stratification
  • Example: [1,0,1,0,0] → "10100"
    oof_preds = np.zeros((len(df), len(CONFIG.LABELS)))
  • Initializes out-of-fold predictions array (for all training samples)
    tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME)
  • Loads DeBERTa tokenizer
    for fold, (train_idx, val_idx) in enumerate(skf.split(df, y_str)):
        print(f"\n{'='*20} FOLD {fold+1}/{CONFIG.N_FOLDS} {'='*20}")
  • Iterates through each fold
  • train_idx: indices for training, val_idx: indices for validation
        df_tr = df.iloc[train_idx].reset_index(drop=True)
        df_va = df.iloc[val_idx].reset_index(drop=True)
  • Splits data into training and validation sets for current fold
  • Resets index for clean indexing
        ds_tr = EmotionDS(df_tr, tokenizer, CONFIG.MAX_LEN)
        ds_va = EmotionDS(df_va, tokenizer, CONFIG.MAX_LEN)
  • Creates PyTorch datasets for training and validation
        dl_tr = torch.utils.data.DataLoader(ds_tr, batch_size=CONFIG.BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
        dl_va = torch.utils.data.DataLoader(ds_va, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)
  • Creates data loaders
  • shuffle=True for training (randomizes batch order)
  • shuffle=False for validation (keeps consistent order)
  • num_workers=2: Uses 2 subprocesses for data loading
  • pin_memory=True: Speeds up CPU→GPU transfer
        model = AutoModelForSequenceClassification.from_pretrained(
            CONFIG.MODEL_NAME, 
            num_labels=len(CONFIG.LABELS),
            problem_type="multi_label_classification"
        )
        model.to(device)
  • Loads pre-trained DeBERTa model
  • Configures for 5-label multi-label classification
  • Moves model to GPU/CPU
        optimizer_params = get_optimizer_params(model, CONFIG.LR, CONFIG.WEIGHT_DECAY)
        optimizer = AdamW(optimizer_params, lr=CONFIG.LR)
  • Gets parameter groups with differential weight decay
  • Creates AdamW optimizer
        total_steps = len(dl_tr) * CONFIG.EPOCHS
        scheduler = get_linear_schedule_with_warmup(
            optimizer, 
            num_warmup_steps=int(total_steps * CONFIG.WARMUP_RATIO), 
            num_training_steps=total_steps
        )
  • Calculates total training steps
  • Creates learning rate scheduler:
    • Warmup: LR increases linearly for 10% of steps
    • Decay: LR decreases linearly to 0 for remaining 90%
        criterion = nn.BCEWithLogitsLoss()
        scaler = GradScaler(enabled=True)
  • BCEWithLogitsLoss: Binary cross-entropy loss for multi-label classification
  • Creates gradient scaler for mixed precision
        best_f1 = 0
        best_state = None
  • Initializes tracking for best model
        for ep in range(CONFIG.EPOCHS):
            train_loss = train_one_epoch(model, dl_tr, optimizer, scheduler, scaler, criterion)
            val_loss, val_preds, val_targs = validate(model, dl_va, criterion)
  • Trains for one epoch
  • Validates on validation set
            val_f1 = f1_score(val_targs, (val_preds >= 0.5).astype(int), average="macro", zero_division=0)
  • Calculates macro F1 score (average F1 across all labels)
  • Uses 0.5 threshold for predictions
            print(f"Ep {ep+1}: TrLoss={train_loss:.4f} | VaLoss={val_loss:.4f} | VaF1={val_f1:.4f}")
  • Prints epoch metrics
            if val_f1 > best_f1:
                best_f1 = val_f1
                best_state = model.state_dict()
  • Saves model state if validation F1 improves
        torch.save(best_state, f"model_fold_{fold}.pth")
  • Saves best model weights to disk
        model.load_state_dict(best_state)
        _, val_preds, _ = validate(model, dl_va, criterion)
        oof_preds[val_idx] = val_preds
  • Loads best weights
  • Gets predictions on validation set
  • Stores out-of-fold predictions
        del model, optimizer, scaler, scheduler
        torch.cuda.empty_cache()
        gc.collect()
  • Deletes objects to free memory
  • Clears GPU cache
  • Runs garbage collector
    return oof_preds, df[CONFIG.LABELS].values
  • Returns out-of-fold predictions and true labels
if os.path.exists(CONFIG.TRAIN_CSV):
    oof_preds, y_true = run_training()
else:
    print("Skipping training as data is not found (likely in a dry-run environment).")
  • Executes training if data exists
  • Otherwise skips gracefully

Section 8: Threshold Optimization

Lines 340-347: Threshold Tuning

if os.path.exists(CONFIG.TRAIN_CSV):
    best_thresholds = tune_thresholds(y_true, oof_preds)
  • Finds optimal threshold for each emotion label using validation predictions
    oof_tuned = (oof_preds >= best_thresholds).astype(int)
  • Converts probabilities to binary predictions using optimized thresholds
    final_f1 = f1_score(y_true, oof_tuned, average="macro", zero_division=0)
    print(f"\nFinal CV Macro F1: {final_f1:.4f}")
    print(f"Best Thresholds: {best_thresholds}")
  • Calculates cross-validated F1 score with optimized thresholds
  • Prints final performance and optimal thresholds
else:
    best_thresholds = np.array([0.5] * len(CONFIG.LABELS))
  • Falls back to 0.5 thresholds if training data not available

Section 9: Inference & Submission

Lines 363-420: predict_test Function

def predict_test(thresholds):
    if not os.path.exists(CONFIG.TEST_CSV):
        print("Test CSV not found.")
        return
  • Checks if test data exists
    df_test = pd.read_csv(CONFIG.TEST_CSV)
    df_test = ensure_text_column(df_test)
  • Loads test data and ensures text column
    tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME)
    ds_test = EmotionDS(df_test, tokenizer, CONFIG.MAX_LEN, is_test=True)
    dl_test = torch.utils.data.DataLoader(ds_test, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2)
  • Creates tokenizer, dataset, and data loader for test data
  • is_test=True: No labels expected
    fold_preds = []
  • Initializes list to store predictions from each fold
    for fold in range(CONFIG.N_FOLDS):
        model_path = f"model_fold_{fold}.pth"
        if not os.path.exists(model_path):
            print(f"Model for fold {fold} not found, skipping.")
            continue
  • Iterates through all folds
  • Checks if model exists
        print(f"Predicting Fold {fold+1}...")
        model = AutoModelForSequenceClassification.from_pretrained(
            CONFIG.MODEL_NAME, 
            num_labels=len(CONFIG.LABELS),
            problem_type="multi_label_classification"
        )
        model.load_state_dict(torch.load(model_path))
        model.to(device)
        model.eval()
  • Loads model architecture
  • Loads trained weights
  • Sets to evaluation mode
        preds = []
        with torch.no_grad():
            for batch in dl_test:
                batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()}
                with autocast(enabled=True):
                    out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"])
                preds.append(torch.sigmoid(out.logits).float().cpu().numpy())
  • Makes predictions without computing gradients
  • Uses mixed precision for speed
  • Applies sigmoid to get probabilities
        fold_preds.append(np.vstack(preds))
        del model
        torch.cuda.empty_cache()
        gc.collect()
  • Stores fold predictions
  • Frees memory
    if not fold_preds:
        print("No predictions made.")
        return
  • Checks if any predictions were made
    avg_preds = np.mean(fold_preds, axis=0)
  • Averages predictions across all folds (ensemble)
    final_preds = (avg_preds >= thresholds).astype(int)
  • Applies optimized thresholds to get binary predictions
    sub = pd.DataFrame(columns=["id"] + CONFIG.LABELS)
    sub["id"] = df_test["id"] if "id" in df_test.columns else np.arange(len(df_test))
    sub[CONFIG.LABELS] = final_preds
    sub.to_csv(CONFIG.SUBMISSION_PATH, index=False)
    print(f"Submission saved to {CONFIG.SUBMISSION_PATH}")
    print(sub.head())
  • Creates submission DataFrame
  • Adds ID column (from data or generated)
  • Adds prediction columns
  • Saves to CSV
  • Displays first few rows
predict_test(best_thresholds)
  • Executes prediction function with optimized thresholds

Summary

This notebook implements a robust emotion classification pipeline with:

  1. K-Fold Cross-Validation: 5-fold stratified CV for reliable performance estimates
  2. State-of-the-Art Model: DeBERTa-v3-base transformer
  3. Optimization Techniques:
    • Mixed precision training (faster, less memory)
    • Gradient clipping (stability)
    • Learning rate warmup and decay
    • Differential weight decay
  4. Threshold Optimization: Per-label thresholds for better F1 scores
  5. Ensemble Prediction: Averages predictions from all folds
  6. Memory Management: Explicit cleanup between folds

The model predicts 5 emotions (anger, fear, joy, sadness, surprise) in a multi-label setting, where text can have multiple emotions simultaneously.