|
Tensor Dimension Mismatch when using TRL GKDTrainer
|
|
3
|
8
|
December 12, 2025
|
|
Transformers.js need for token to char mapping
|
|
3
|
12
|
December 11, 2025
|
|
[Pipelines] Mask Generation Parameters
|
|
2
|
30
|
December 10, 2025
|
|
Having trouble to configure trainer for T5 model evaluation
|
|
1
|
18
|
December 9, 2025
|
|
How do I speedup my callbacks and reduce stall before they start?
|
|
1
|
17
|
December 9, 2025
|
|
Getting 429 Too Many Request
|
|
3
|
24
|
December 8, 2025
|
|
How to add new language to NLLB tokenizer in Huggingface?
|
|
3
|
2009
|
December 6, 2025
|
|
Is it possible to remove all other language from NLLB200 except English and German?
|
|
2
|
748
|
December 6, 2025
|
|
How to use nllb1.3b model to fine-tune the English to German bidirectional translation task?
|
|
2
|
97
|
December 6, 2025
|
|
SAE for Codegemma
|
|
3
|
17
|
December 6, 2025
|
|
Obtain raw logits before decoding scaling is applied
|
|
1
|
23
|
December 5, 2025
|
|
CUDA Out Of Memory when training a DETR Object detection model with compute_metrics
|
|
4
|
164
|
December 3, 2025
|
|
How to understand the special tokens?
|
|
7
|
82
|
December 2, 2025
|
|
GETTING ERROR >> AttributeError: 'InferenceClient' object has no attribute 'post'
|
|
18
|
2010
|
November 30, 2025
|
|
Dora training taking 8x time? Why?
|
|
2
|
100
|
November 27, 2025
|
|
ContractNLI-based NDA Risk Analyzer using RoBERTa + Chunking – Looking for Feedback
|
|
6
|
44
|
November 25, 2025
|
|
Train instance segmentation model with dinov3 backbone
|
|
3
|
95
|
November 24, 2025
|
|
DistilBERT reaches 76% accuracy but still predicts “believable” for impossible/fantasy excuses — why?
|
|
3
|
33
|
November 23, 2025
|
|
Search query autocomplete from the queries I have in my data
|
|
1
|
1691
|
November 21, 2025
|
|
How to sample from the validation set when using Trainer?
|
|
5
|
1980
|
November 21, 2025
|
|
Evaluate subset of data during training
|
|
6
|
5964
|
November 21, 2025
|
|
NeuroTrace – GPT-2 Small Residual Attack & Defence Framework (IOI Task)
|
|
0
|
27
|
November 21, 2025
|
|
Passing Inputs Longer Than 512 Tokens After Pretraining a T5 Model: Is It Safe?
|
|
3
|
59
|
November 20, 2025
|
|
[LLaVA-1.5] Validating Logic for Token-Level KV Cache Extraction
|
|
3
|
23
|
November 20, 2025
|
|
Evalutation of expert router logits simultanous to generation
|
|
4
|
56
|
November 19, 2025
|
|
AetherMind-KD-Student (184M) Compact, fast, and robust NLI model distilled from DeBERTa-v3
|
|
2
|
19
|
November 15, 2025
|
|
Fine-tuning a custom module but do not use LoRA
|
|
1
|
58
|
November 14, 2025
|
|
Inconsistent output between flash attention and eager
|
|
3
|
113
|
November 14, 2025
|
|
Num_return_sequences > num_beams
|
|
3
|
22
|
November 13, 2025
|
|
Debugging inf/NaN Loss in Multi-Process Optuna/PyTorch Lightning HPO in Colab
|
|
3
|
42
|
November 13, 2025
|