GPT-2.3-High

GPT-2.3-High is the definitive fine-tuned iteration of the GPT-2 architecture in this series, specifically optimized for high-coherence long-form text generation.

Technical Specifications

Model Name: GPT-2.3-High
Base Architecture: GPT-2 (Small)
Total Parameters: 125,226,240 (~124 Million)
Context Window: Upgraded to 2048 tokens
Training Dataset: Wikitext-2-raw-v1 (20% subset)
Training Epochs: 3
Framework: PyTorch & Hugging Face Transformers

Accuracy & Evaluation

Following the 'healing' fine-tune, the model was evaluated on the official unseen test split of the Wikitext-2 dataset.

Test Set Perplexity (PPL): 4.06
Training Set Perplexity (PPL): 2.25
IF your going to test it use the dataset i trained it on witch is Wikitext-2-raw-v1

Overview & Capabilities

GPT-2.3-High was developed to solve the 'word salad' and repetition issues found in previous 2048-token iterations. By performing a 'healing' fine-tune on a larger dataset slice (20%), the model learned to manage the expanded positional embeddings effectively.

Usage Instructions

To use GPT-2.3-High, ensure you have the transformers library installed. Due to the manual context expansion, the ignore_mismatched_sizes=True flag is required during loading.

from transformers import GPT2LMHeadModel, GPT2Tokenizer

repo_id = "BikoRiko/GPT-2.3-High"

model = GPT2LMHeadModel.from_pretrained(repo_id, ignore_mismatched_sizes=True)
tokenizer = GPT2Tokenizer.from_pretrained(repo_id)

Downloads last month: 36

Safetensors

Model size

0.1B params

Tensor type

F32

BikoRiko
/

GPT-2.3-High

GPT-2.3-High

Technical Specifications

Accuracy & Evaluation

Overview & Capabilities

Usage Instructions

Dataset used to train BikoRiko/GPT-2.3-High