tinyshakespeare-13m

This model is a fine-tuned version of on the tiny_shakespeare dataset. It achieves the following results on the evaluation set:

  • Loss: 4.9693

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 32 7.3043
7.7103 2.0 64 5.9158
7.7103 3.0 96 5.6154
5.8675 4.0 128 5.3680
5.4479 5.0 160 5.2088
5.4479 6.0 192 5.1126
5.1825 7.0 224 5.0313
4.9945 8.0 256 4.9771
4.9945 9.0 288 4.9379
4.8838 10.0 320 4.9208
4.7883 11.0 352 4.8985
4.7883 12.0 384 4.8766
4.7261 13.0 416 4.8631
4.7261 14.0 448 4.8617
4.6621 15.0 480 4.8445
4.5955 16.0 512 4.8370
4.5955 17.0 544 4.8295
4.52 18.0 576 4.8215
4.4819 19.0 608 4.8278
4.4819 20.0 640 4.8169
4.4415 21.0 672 4.8252
4.3929 22.0 704 4.8199
4.3929 23.0 736 4.8243
4.3438 24.0 768 4.8340
4.3117 25.0 800 4.8309
4.3117 26.0 832 4.8410
4.2626 27.0 864 4.8439
4.2626 28.0 896 4.8437
4.2404 29.0 928 4.8404
4.1957 30.0 960 4.8540
4.1957 31.0 992 4.8560
4.1681 32.0 1024 4.8653
4.1441 33.0 1056 4.8725
4.1441 34.0 1088 4.8770
4.1097 35.0 1120 4.8798
4.0823 36.0 1152 4.8884
4.0823 37.0 1184 4.8869
4.0783 38.0 1216 4.8925
4.0783 39.0 1248 4.8948
4.0641 40.0 1280 4.8941

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
7
Safetensors
Model size
5.52M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MolecularReality/tinyshakespeare-13m

Quantizations
1 model

Dataset used to train MolecularReality/tinyshakespeare-13m

Evaluation results