kde4-en-fr
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-fr on the KDE4 dataset (English to French).
Model description
This model has been adapted to the domain of technical software documentation and user interface localization.
It was fine-tuned on the KDE4 dataset, which consists of manual translations of KDE apps. Unlike general-purpose translation models, this model learns specific localization preferences common in the French tech community (e.g., handling terms like "threads," "plugin," or "email" in a way that matches technical usage rather than literal translation).
Intended uses & limitations
- Intended Use: Translation of technical texts, software strings, and documentation from English to French.
- Limitations: The model is specialized for computer science and software terminology. It may perform differently than the base model on general conversational text or literature.
Training and evaluation data
The model was trained on the KDE4 dataset, specifically the English-French subset.
- Dataset:
kde4 - Language Pair: English (
en) -> French (fr) - Preprocessing: Sentences were truncated to a maximum length of 128 tokens.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
Training results
The model showed steady convergence over 3 epochs (approx 35,000 steps).
| Training Loss | Step |
|---|---|
| 1.4339 | 500 |
| 1.0881 | 5000 |
| 1.0051 | 10000 |
| 0.8804 | 15000 |
| 0.8459 | 20000 |
| 0.7696 | 25000 |
| 0.7650 | 30000 |
| 0.7772 | 35000 |
Evaluation Results
BLEU Score
- Fine-tuned model: 0.5216
- Pretrained model: 0.3817
Note: The fine-tuned model demonstrates improved adherence to domain-specific terminology (e.g., preserving English technical terms like "email" or "plugin" where appropriate for French technical context) compared to the base model.
Framework versions
- Transformers 4.57.3
- Pytorch 2.9.0+cu126
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- 97
Model tree for mariadelcarmenramirez/kde4-en-fr
Base model
Helsinki-NLP/opus-mt-en-fr