tinyshakespeare-13m

This model is a fine-tuned version of on the tiny_shakespeare dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
num_epochs: 40
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	1.0	32	7.3043
7.7103	2.0	64	5.9158
7.7103	3.0	96	5.6154
5.8675	4.0	128	5.3680
5.4479	5.0	160	5.2088
5.4479	6.0	192	5.1126
5.1825	7.0	224	5.0313
4.9945	8.0	256	4.9771
4.9945	9.0	288	4.9379
4.8838	10.0	320	4.9208
4.7883	11.0	352	4.8985
4.7883	12.0	384	4.8766
4.7261	13.0	416	4.8631
4.7261	14.0	448	4.8617
4.6621	15.0	480	4.8445
4.5955	16.0	512	4.8370
4.5955	17.0	544	4.8295
4.52	18.0	576	4.8215
4.4819	19.0	608	4.8278
4.4819	20.0	640	4.8169
4.4415	21.0	672	4.8252
4.3929	22.0	704	4.8199
4.3929	23.0	736	4.8243
4.3438	24.0	768	4.8340
4.3117	25.0	800	4.8309
4.3117	26.0	832	4.8410
4.2626	27.0	864	4.8439
4.2626	28.0	896	4.8437
4.2404	29.0	928	4.8404
4.1957	30.0	960	4.8540
4.1957	31.0	992	4.8560
4.1681	32.0	1024	4.8653
4.1441	33.0	1056	4.8725
4.1441	34.0	1088	4.8770
4.1097	35.0	1120	4.8798
4.0823	36.0	1152	4.8884
4.0823	37.0	1184	4.8869
4.0783	38.0	1216	4.8925
4.0783	39.0	1248	4.8948
4.0641	40.0	1280	4.8941

Safetensors

Model size

5.52M params

Tensor type

F32

Quantizations