| # Deep Learning Emotion Classification - Code Explanation | |
| This document provides a detailed line-by-line explanation of the `main.ipynb` notebook, which implements a multi-label emotion classification system using the DeBERTa transformer model with K-Fold cross-validation. | |
| --- | |
| ## Section 1: Imports & Setup | |
| ### Lines 18-36: Import Statements | |
| ```python | |
| import numpy as np | |
| import pandas as pd | |
| ``` | |
| - **numpy**: Used for numerical operations, array manipulation, and random seed setting | |
| - **pandas**: Used for data loading and manipulation (CSV files, DataFrames) | |
| ```python | |
| import torch | |
| import torch.nn as nn | |
| ``` | |
| - **torch**: PyTorch deep learning framework for tensor operations and model training | |
| - **torch.nn**: Neural network modules including loss functions | |
| ```python | |
| from sklearn.model_selection import StratifiedKFold | |
| from sklearn.metrics import f1_score | |
| ``` | |
| - **StratifiedKFold**: Creates k-fold splits while maintaining class distribution in each fold | |
| - **f1_score**: Calculates F1 metric for evaluation (harmonic mean of precision and recall) | |
| ```python | |
| from transformers import ( | |
| AutoTokenizer, | |
| AutoModelForSequenceClassification, | |
| get_linear_schedule_with_warmup, | |
| AutoConfig | |
| ) | |
| ``` | |
| - **AutoTokenizer**: Automatically loads the appropriate tokenizer for the specified model | |
| - **AutoModelForSequenceClassification**: Pre-trained transformer model for classification tasks | |
| - **get_linear_schedule_with_warmup**: Learning rate scheduler with warmup and linear decay | |
| - **AutoConfig**: Model configuration loader | |
| ```python | |
| from torch.optim import AdamW | |
| ``` | |
| - **AdamW**: Adam optimizer with decoupled weight decay (better than standard Adam for transformers) | |
| ```python | |
| from torch.cuda.amp import autocast, GradScaler | |
| ``` | |
| - **autocast**: Enables automatic mixed precision (AMP) to speed up training | |
| - **GradScaler**: Scales gradients for mixed precision training to prevent underflow | |
| ```python | |
| import gc | |
| import warnings | |
| import os | |
| ``` | |
| - **gc**: Garbage collection to free up memory | |
| - **warnings**: To suppress warning messages | |
| - **os**: For file system operations and environment variables | |
| ```python | |
| warnings.filterwarnings("ignore") | |
| ``` | |
| - Suppresses all warning messages for cleaner output | |
| --- | |
| ## Section 2: Configuration | |
| ### Lines 52-68: Configuration Class | |
| ```python | |
| class Config: | |
| SEED = 42 | |
| ``` | |
| - Sets random seed for reproducibility across all random operations | |
| ```python | |
| LABELS = ["anger", "fear", "joy", "sadness", "surprise"] | |
| ``` | |
| - Defines the 5 emotion labels for multi-label classification | |
| ```python | |
| MODEL_NAME = "microsoft/deberta-v3-base" | |
| ``` | |
| - Specifies the pre-trained model (DeBERTa v3 base - 184M parameters, SOTA performance) | |
| ```python | |
| MAX_LEN = 128 | |
| ``` | |
| - Maximum sequence length for tokenization (tokens longer than this are truncated) | |
| ```python | |
| BATCH_SIZE = 16 | |
| ``` | |
| - Number of samples processed together in one forward/backward pass | |
| ```python | |
| EPOCHS = 4 | |
| ``` | |
| - Number of complete passes through the training dataset | |
| ```python | |
| LR = 1.5e-5 | |
| ``` | |
| - Learning rate (1.5 × 10⁻⁵) - small value typical for fine-tuning transformers | |
| ```python | |
| WEIGHT_DECAY = 0.01 | |
| ``` | |
| - L2 regularization strength to prevent overfitting | |
| ```python | |
| WARMUP_RATIO = 0.1 | |
| ``` | |
| - Fraction of training steps used for learning rate warmup (10% of total steps) | |
| ```python | |
| N_FOLDS = 5 | |
| ``` | |
| - Number of folds for K-Fold cross-validation | |
| ```python | |
| TRAIN_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/train.csv" | |
| TEST_CSV = "/kaggle/input/2025-sep-dl-gen-ai-project/test.csv" | |
| ``` | |
| - Paths to training and test datasets (Kaggle environment paths) | |
| ```python | |
| SUBMISSION_PATH = "submission.csv" | |
| ``` | |
| - Output file for predictions | |
| ```python | |
| CONFIG = Config() | |
| ``` | |
| - Creates a global instance of the configuration class | |
| --- | |
| ## Section 3: Seed & Device Setup | |
| ### Lines 84-93: Reproducibility and Device Selection | |
| ```python | |
| def set_seed(seed=CONFIG.SEED): | |
| np.random.seed(seed) | |
| ``` | |
| - Sets numpy's random seed for reproducible random number generation | |
| ```python | |
| torch.manual_seed(seed) | |
| ``` | |
| - Sets PyTorch's random seed for CPU operations | |
| ```python | |
| torch.cuda.manual_seed_all(seed) | |
| ``` | |
| - Sets PyTorch's random seed for all GPU devices | |
| ```python | |
| os.environ['PYTHONHASHSEED'] = str(seed) | |
| ``` | |
| - Sets hash seed for Python's built-in hash() function for reproducibility | |
| ```python | |
| set_seed() | |
| ``` | |
| - Calls the seed setting function | |
| ```python | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| print(f"Using device: {device}") | |
| ``` | |
| - Checks if GPU is available; uses GPU if available, otherwise falls back to CPU | |
| - Prints the device being used for training | |
| --- | |
| ## Section 4: Utility Functions | |
| ### Lines 109-115: `ensure_text_column` Function | |
| ```python | |
| def ensure_text_column(df: pd.DataFrame) -> pd.DataFrame: | |
| if "text" in df.columns: | |
| return df | |
| ``` | |
| - Checks if DataFrame already has a "text" column; if yes, returns unchanged | |
| ```python | |
| for c in ["comment_text", "sentence", "content", "review"]: | |
| if c in df.columns: | |
| return df.rename(columns={c: "text"}) | |
| ``` | |
| - Searches for common alternative text column names | |
| - Renames the first matching column to "text" for standardization | |
| ```python | |
| raise ValueError("No text column found. Add/rename your text column to 'text'.") | |
| ``` | |
| - Raises an error if no text column is found | |
| ### Lines 117-126: `tune_thresholds` Function | |
| ```python | |
| def tune_thresholds(y_true: np.ndarray, y_prob: np.ndarray) -> np.ndarray: | |
| th = np.zeros(y_true.shape[1], dtype=np.float32) | |
| ``` | |
| - Creates array to store optimal threshold for each label (initialized to 0) | |
| - Multi-label classification requires separate thresholds per label | |
| ```python | |
| for j in range(y_true.shape[1]): | |
| best_t, best_f1 = 0.5, -1 | |
| ``` | |
| - Iterates through each label | |
| - Initializes best threshold to 0.5 (default) and best F1 to -1 | |
| ```python | |
| for t in np.linspace(0.1, 0.9, 17): | |
| ``` | |
| - Tests 17 threshold values evenly spaced between 0.1 and 0.9 | |
| ```python | |
| f1 = f1_score(y_true[:, j], (y_prob[:, j] >= t).astype(int), zero_division=0) | |
| ``` | |
| - Calculates F1 score for current label and threshold | |
| - Converts probabilities to binary predictions using threshold | |
| ```python | |
| if f1 > best_f1: | |
| best_f1, best_t = f1, t | |
| ``` | |
| - Updates best threshold if current F1 is better | |
| ```python | |
| th[j] = best_t | |
| return th | |
| ``` | |
| - Stores optimal threshold for each label and returns the array | |
| ### Lines 128-141: `get_optimizer_params` Function | |
| ```python | |
| def get_optimizer_params(model, lr, weight_decay): | |
| param_optimizer = list(model.named_parameters()) | |
| ``` | |
| - Gets all model parameters with their names | |
| ```python | |
| no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"] | |
| ``` | |
| - Lists parameters that should NOT have weight decay applied | |
| - Bias and LayerNorm parameters typically trained without weight decay | |
| ```python | |
| optimizer_parameters = [ | |
| { | |
| "params": [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], | |
| "weight_decay": weight_decay, | |
| }, | |
| ``` | |
| - First parameter group: all parameters EXCEPT bias and LayerNorm | |
| - These parameters will have weight decay applied | |
| ```python | |
| { | |
| "params": [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], | |
| "weight_decay": 0.0, | |
| }, | |
| ] | |
| ``` | |
| - Second parameter group: only bias and LayerNorm parameters | |
| - These parameters have weight decay set to 0.0 | |
| ```python | |
| return optimizer_parameters | |
| ``` | |
| - Returns grouped parameters for differential weight decay | |
| --- | |
| ## Section 5: Dataset Class | |
| ### Lines 157-180: `EmotionDS` Class | |
| ```python | |
| class EmotionDS(torch.utils.data.Dataset): | |
| def __init__(self, df, tokenizer, max_len, is_test=False): | |
| ``` | |
| - Custom PyTorch Dataset class for emotion classification | |
| - `is_test` flag indicates whether this is test data (no labels) | |
| ```python | |
| self.texts = df["text"].tolist() | |
| ``` | |
| - Extracts text data as a Python list | |
| ```python | |
| self.is_test = is_test | |
| if not is_test: | |
| self.labels = df[CONFIG.LABELS].values.astype(np.float32) | |
| ``` | |
| - Stores test flag | |
| - If training data, extracts multi-label targets as float32 array | |
| ```python | |
| self.tok = tokenizer | |
| self.max_len = max_len | |
| ``` | |
| - Stores tokenizer and max length for later use | |
| ```python | |
| def __len__(self): | |
| return len(self.texts) | |
| ``` | |
| - Returns dataset size (required by PyTorch) | |
| ```python | |
| def __getitem__(self, i): | |
| enc = self.tok( | |
| self.texts[i], | |
| truncation=True, | |
| padding="max_length", | |
| max_length=self.max_len, | |
| return_tensors="pt", | |
| ) | |
| ``` | |
| - Tokenizes the text at index `i` | |
| - **truncation**: Cuts text longer than max_len | |
| - **padding**: Pads shorter sequences to max_len | |
| - **return_tensors="pt"**: Returns PyTorch tensors | |
| ```python | |
| item = {k: v.squeeze(0) for k, v in enc.items()} | |
| ``` | |
| - Removes the batch dimension (1, seq_len) → (seq_len) | |
| - Returns dict with keys: input_ids, attention_mask, token_type_ids (if applicable) | |
| ```python | |
| if not self.is_test: | |
| item["labels"] = torch.tensor(self.labels[i]) | |
| return item | |
| ``` | |
| - Adds labels to the item dict if training data | |
| - Returns the complete item | |
| --- | |
| ## Section 6: Training & Validation Helper Functions | |
| ### Lines 196-213: `train_one_epoch` Function | |
| ```python | |
| def train_one_epoch(model, loader, optimizer, scheduler, scaler, criterion): | |
| model.train() | |
| ``` | |
| - Sets model to training mode (enables dropout, batch normalization updates) | |
| ```python | |
| losses = [] | |
| for batch in loader: | |
| ``` | |
| - Initializes list to track losses | |
| - Iterates through batches | |
| ```python | |
| batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()} | |
| ``` | |
| - Moves batch data to GPU (or CPU) | |
| - `non_blocking=True`: Async transfer for faster processing | |
| ```python | |
| optimizer.zero_grad(set_to_none=True) | |
| ``` | |
| - Clears gradients from previous step | |
| - `set_to_none=True`: More memory efficient than setting to zero | |
| ```python | |
| with autocast(enabled=True): | |
| out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"]) | |
| loss = criterion(out.logits, batch["labels"]) | |
| ``` | |
| - **autocast**: Uses mixed precision (float16) for faster computation | |
| - Forward pass through model | |
| - Calculates loss between predictions (logits) and true labels | |
| ```python | |
| scaler.scale(loss).backward() | |
| ``` | |
| - Scales loss to prevent gradient underflow in mixed precision | |
| - Computes gradients via backpropagation | |
| ```python | |
| scaler.unscale_(optimizer) | |
| torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) | |
| ``` | |
| - Unscales gradients before clipping | |
| - Clips gradients to maximum norm of 1.0 to prevent exploding gradients | |
| ```python | |
| scaler.step(optimizer) | |
| scaler.update() | |
| ``` | |
| - Updates model parameters (with scaled gradients) | |
| - Updates the scaler's internal state | |
| ```python | |
| scheduler.step() | |
| ``` | |
| - Updates learning rate according to schedule | |
| ```python | |
| losses.append(loss.item()) | |
| return np.mean(losses) | |
| ``` | |
| - Stores loss value | |
| - Returns average loss for the epoch | |
| ### Lines 215-230: `validate` Function | |
| ```python | |
| def validate(model, loader, criterion): | |
| model.eval() | |
| ``` | |
| - Sets model to evaluation mode (disables dropout, fixes batch norm) | |
| ```python | |
| losses = [] | |
| preds = [] | |
| targs = [] | |
| ``` | |
| - Initializes lists for losses, predictions, and targets | |
| ```python | |
| with torch.no_grad(): | |
| ``` | |
| - Disables gradient computation (saves memory and speeds up inference) | |
| ```python | |
| for batch in loader: | |
| batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()} | |
| with autocast(enabled=True): | |
| out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"]) | |
| loss = criterion(out.logits, batch["labels"]) | |
| ``` | |
| - Moves batch to device | |
| - Forward pass with mixed precision | |
| - Calculates validation loss | |
| ```python | |
| losses.append(loss.item()) | |
| preds.append(torch.sigmoid(out.logits).float().cpu().numpy()) | |
| targs.append(batch["labels"].cpu().numpy()) | |
| ``` | |
| - Stores loss | |
| - Applies sigmoid to convert logits to probabilities [0, 1] | |
| - Moves predictions and targets to CPU as numpy arrays | |
| ```python | |
| return np.mean(losses), np.vstack(preds), np.vstack(targs) | |
| ``` | |
| - Returns average loss, stacked predictions, and stacked targets | |
| --- | |
| ## Section 7: Main K-Fold Training Loop | |
| ### Lines 246-324: `run_training` Function | |
| ```python | |
| def run_training(): | |
| if not os.path.exists(CONFIG.TRAIN_CSV): | |
| print("Train CSV not found. Please check the path.") | |
| return None, None | |
| ``` | |
| - Checks if training data exists | |
| - Returns None if not found (graceful failure) | |
| ```python | |
| df = pd.read_csv(CONFIG.TRAIN_CSV) | |
| df = ensure_text_column(df) | |
| ``` | |
| - Loads training data | |
| - Ensures text column exists | |
| ```python | |
| skf = StratifiedKFold(n_splits=CONFIG.N_FOLDS, shuffle=True, random_state=CONFIG.SEED) | |
| y_str = df[CONFIG.LABELS].astype(str).agg("".join, axis=1) | |
| ``` | |
| - Creates 5-fold stratified splitter | |
| - Converts multi-label to string representation for stratification | |
| - Example: [1,0,1,0,0] → "10100" | |
| ```python | |
| oof_preds = np.zeros((len(df), len(CONFIG.LABELS))) | |
| ``` | |
| - Initializes out-of-fold predictions array (for all training samples) | |
| ```python | |
| tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME) | |
| ``` | |
| - Loads DeBERTa tokenizer | |
| ```python | |
| for fold, (train_idx, val_idx) in enumerate(skf.split(df, y_str)): | |
| print(f"\n{'='*20} FOLD {fold+1}/{CONFIG.N_FOLDS} {'='*20}") | |
| ``` | |
| - Iterates through each fold | |
| - `train_idx`: indices for training, `val_idx`: indices for validation | |
| ```python | |
| df_tr = df.iloc[train_idx].reset_index(drop=True) | |
| df_va = df.iloc[val_idx].reset_index(drop=True) | |
| ``` | |
| - Splits data into training and validation sets for current fold | |
| - Resets index for clean indexing | |
| ```python | |
| ds_tr = EmotionDS(df_tr, tokenizer, CONFIG.MAX_LEN) | |
| ds_va = EmotionDS(df_va, tokenizer, CONFIG.MAX_LEN) | |
| ``` | |
| - Creates PyTorch datasets for training and validation | |
| ```python | |
| dl_tr = torch.utils.data.DataLoader(ds_tr, batch_size=CONFIG.BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True) | |
| dl_va = torch.utils.data.DataLoader(ds_va, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True) | |
| ``` | |
| - Creates data loaders | |
| - **shuffle=True** for training (randomizes batch order) | |
| - **shuffle=False** for validation (keeps consistent order) | |
| - **num_workers=2**: Uses 2 subprocesses for data loading | |
| - **pin_memory=True**: Speeds up CPU→GPU transfer | |
| ```python | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| CONFIG.MODEL_NAME, | |
| num_labels=len(CONFIG.LABELS), | |
| problem_type="multi_label_classification" | |
| ) | |
| model.to(device) | |
| ``` | |
| - Loads pre-trained DeBERTa model | |
| - Configures for 5-label multi-label classification | |
| - Moves model to GPU/CPU | |
| ```python | |
| optimizer_params = get_optimizer_params(model, CONFIG.LR, CONFIG.WEIGHT_DECAY) | |
| optimizer = AdamW(optimizer_params, lr=CONFIG.LR) | |
| ``` | |
| - Gets parameter groups with differential weight decay | |
| - Creates AdamW optimizer | |
| ```python | |
| total_steps = len(dl_tr) * CONFIG.EPOCHS | |
| scheduler = get_linear_schedule_with_warmup( | |
| optimizer, | |
| num_warmup_steps=int(total_steps * CONFIG.WARMUP_RATIO), | |
| num_training_steps=total_steps | |
| ) | |
| ``` | |
| - Calculates total training steps | |
| - Creates learning rate scheduler: | |
| - Warmup: LR increases linearly for 10% of steps | |
| - Decay: LR decreases linearly to 0 for remaining 90% | |
| ```python | |
| criterion = nn.BCEWithLogitsLoss() | |
| scaler = GradScaler(enabled=True) | |
| ``` | |
| - **BCEWithLogitsLoss**: Binary cross-entropy loss for multi-label classification | |
| - Creates gradient scaler for mixed precision | |
| ```python | |
| best_f1 = 0 | |
| best_state = None | |
| ``` | |
| - Initializes tracking for best model | |
| ```python | |
| for ep in range(CONFIG.EPOCHS): | |
| train_loss = train_one_epoch(model, dl_tr, optimizer, scheduler, scaler, criterion) | |
| val_loss, val_preds, val_targs = validate(model, dl_va, criterion) | |
| ``` | |
| - Trains for one epoch | |
| - Validates on validation set | |
| ```python | |
| val_f1 = f1_score(val_targs, (val_preds >= 0.5).astype(int), average="macro", zero_division=0) | |
| ``` | |
| - Calculates macro F1 score (average F1 across all labels) | |
| - Uses 0.5 threshold for predictions | |
| ```python | |
| print(f"Ep {ep+1}: TrLoss={train_loss:.4f} | VaLoss={val_loss:.4f} | VaF1={val_f1:.4f}") | |
| ``` | |
| - Prints epoch metrics | |
| ```python | |
| if val_f1 > best_f1: | |
| best_f1 = val_f1 | |
| best_state = model.state_dict() | |
| ``` | |
| - Saves model state if validation F1 improves | |
| ```python | |
| torch.save(best_state, f"model_fold_{fold}.pth") | |
| ``` | |
| - Saves best model weights to disk | |
| ```python | |
| model.load_state_dict(best_state) | |
| _, val_preds, _ = validate(model, dl_va, criterion) | |
| oof_preds[val_idx] = val_preds | |
| ``` | |
| - Loads best weights | |
| - Gets predictions on validation set | |
| - Stores out-of-fold predictions | |
| ```python | |
| del model, optimizer, scaler, scheduler | |
| torch.cuda.empty_cache() | |
| gc.collect() | |
| ``` | |
| - Deletes objects to free memory | |
| - Clears GPU cache | |
| - Runs garbage collector | |
| ```python | |
| return oof_preds, df[CONFIG.LABELS].values | |
| ``` | |
| - Returns out-of-fold predictions and true labels | |
| ```python | |
| if os.path.exists(CONFIG.TRAIN_CSV): | |
| oof_preds, y_true = run_training() | |
| else: | |
| print("Skipping training as data is not found (likely in a dry-run environment).") | |
| ``` | |
| - Executes training if data exists | |
| - Otherwise skips gracefully | |
| --- | |
| ## Section 8: Threshold Optimization | |
| ### Lines 340-347: Threshold Tuning | |
| ```python | |
| if os.path.exists(CONFIG.TRAIN_CSV): | |
| best_thresholds = tune_thresholds(y_true, oof_preds) | |
| ``` | |
| - Finds optimal threshold for each emotion label using validation predictions | |
| ```python | |
| oof_tuned = (oof_preds >= best_thresholds).astype(int) | |
| ``` | |
| - Converts probabilities to binary predictions using optimized thresholds | |
| ```python | |
| final_f1 = f1_score(y_true, oof_tuned, average="macro", zero_division=0) | |
| print(f"\nFinal CV Macro F1: {final_f1:.4f}") | |
| print(f"Best Thresholds: {best_thresholds}") | |
| ``` | |
| - Calculates cross-validated F1 score with optimized thresholds | |
| - Prints final performance and optimal thresholds | |
| ```python | |
| else: | |
| best_thresholds = np.array([0.5] * len(CONFIG.LABELS)) | |
| ``` | |
| - Falls back to 0.5 thresholds if training data not available | |
| --- | |
| ## Section 9: Inference & Submission | |
| ### Lines 363-420: `predict_test` Function | |
| ```python | |
| def predict_test(thresholds): | |
| if not os.path.exists(CONFIG.TEST_CSV): | |
| print("Test CSV not found.") | |
| return | |
| ``` | |
| - Checks if test data exists | |
| ```python | |
| df_test = pd.read_csv(CONFIG.TEST_CSV) | |
| df_test = ensure_text_column(df_test) | |
| ``` | |
| - Loads test data and ensures text column | |
| ```python | |
| tokenizer = AutoTokenizer.from_pretrained(CONFIG.MODEL_NAME) | |
| ds_test = EmotionDS(df_test, tokenizer, CONFIG.MAX_LEN, is_test=True) | |
| dl_test = torch.utils.data.DataLoader(ds_test, batch_size=CONFIG.BATCH_SIZE, shuffle=False, num_workers=2) | |
| ``` | |
| - Creates tokenizer, dataset, and data loader for test data | |
| - `is_test=True`: No labels expected | |
| ```python | |
| fold_preds = [] | |
| ``` | |
| - Initializes list to store predictions from each fold | |
| ```python | |
| for fold in range(CONFIG.N_FOLDS): | |
| model_path = f"model_fold_{fold}.pth" | |
| if not os.path.exists(model_path): | |
| print(f"Model for fold {fold} not found, skipping.") | |
| continue | |
| ``` | |
| - Iterates through all folds | |
| - Checks if model exists | |
| ```python | |
| print(f"Predicting Fold {fold+1}...") | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| CONFIG.MODEL_NAME, | |
| num_labels=len(CONFIG.LABELS), | |
| problem_type="multi_label_classification" | |
| ) | |
| model.load_state_dict(torch.load(model_path)) | |
| model.to(device) | |
| model.eval() | |
| ``` | |
| - Loads model architecture | |
| - Loads trained weights | |
| - Sets to evaluation mode | |
| ```python | |
| preds = [] | |
| with torch.no_grad(): | |
| for batch in dl_test: | |
| batch = {k: v.to(device, non_blocking=True) for k, v in batch.items()} | |
| with autocast(enabled=True): | |
| out = model(input_ids=batch["input_ids"], attention_mask=batch["attention_mask"]) | |
| preds.append(torch.sigmoid(out.logits).float().cpu().numpy()) | |
| ``` | |
| - Makes predictions without computing gradients | |
| - Uses mixed precision for speed | |
| - Applies sigmoid to get probabilities | |
| ```python | |
| fold_preds.append(np.vstack(preds)) | |
| del model | |
| torch.cuda.empty_cache() | |
| gc.collect() | |
| ``` | |
| - Stores fold predictions | |
| - Frees memory | |
| ```python | |
| if not fold_preds: | |
| print("No predictions made.") | |
| return | |
| ``` | |
| - Checks if any predictions were made | |
| ```python | |
| avg_preds = np.mean(fold_preds, axis=0) | |
| ``` | |
| - Averages predictions across all folds (ensemble) | |
| ```python | |
| final_preds = (avg_preds >= thresholds).astype(int) | |
| ``` | |
| - Applies optimized thresholds to get binary predictions | |
| ```python | |
| sub = pd.DataFrame(columns=["id"] + CONFIG.LABELS) | |
| sub["id"] = df_test["id"] if "id" in df_test.columns else np.arange(len(df_test)) | |
| sub[CONFIG.LABELS] = final_preds | |
| sub.to_csv(CONFIG.SUBMISSION_PATH, index=False) | |
| print(f"Submission saved to {CONFIG.SUBMISSION_PATH}") | |
| print(sub.head()) | |
| ``` | |
| - Creates submission DataFrame | |
| - Adds ID column (from data or generated) | |
| - Adds prediction columns | |
| - Saves to CSV | |
| - Displays first few rows | |
| ```python | |
| predict_test(best_thresholds) | |
| ``` | |
| - Executes prediction function with optimized thresholds | |
| --- | |
| ## Summary | |
| This notebook implements a **robust emotion classification pipeline** with: | |
| 1. **K-Fold Cross-Validation**: 5-fold stratified CV for reliable performance estimates | |
| 2. **State-of-the-Art Model**: DeBERTa-v3-base transformer | |
| 3. **Optimization Techniques**: | |
| - Mixed precision training (faster, less memory) | |
| - Gradient clipping (stability) | |
| - Learning rate warmup and decay | |
| - Differential weight decay | |
| 4. **Threshold Optimization**: Per-label thresholds for better F1 scores | |
| 5. **Ensemble Prediction**: Averages predictions from all folds | |
| 6. **Memory Management**: Explicit cleanup between folds | |
| The model predicts 5 emotions (anger, fear, joy, sadness, surprise) in a **multi-label** setting, where text can have multiple emotions simultaneously. | |