πŸš€ Decode-12B-MoE: High-Performance Mixture of Experts Model

Decode-12B-MoE is a Large Language Model (LLM) utilizing a Sparse Mixture of Experts (MoE) architecture with a total of 12.5 billion parameters. This model is engineered to bridge the gap between massive parameter counts and computational efficiency, activating only a fraction of its weights (~2.5B) during inference. ** Untrained model! **

πŸ“Œ Technical Specifications

Attribute Value
Total Parameters 12,500,340,736 (12.5B)
Active Parameters ~2.5B per token
Architecture Sparse MoE (Decoder-only)
Context Window 4096 tokens
Format Bfloat16 / Float16
Training Hardware NVIDIA Tesla T4 (Prototyping) / [Your_Main_GPU]

πŸ›  Training Methodology

The model was trained with advanced memory optimization techniques to ensure stability on consumer and enterprise-grade hardware:

  • 8-bit Optimizer: Utilized bitsandbytes AdamW to reduce optimizer state memory footprint by 75%.
  • Gradient Checkpointing: Enabled to manage activation memory for deep MoE layers.
  • Dataset: Fine-tuned on a diverse corpus of Vietnamese and English text, focusing on reasoning, logic, and natural conversation.

πŸ’» Quick Start (Usage)

To use this model, ensure you have transformers and accelerate installed.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Replace with your actual Hugging Face repo ID
model_id = "your-username/decode-12b-moe"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True # Required for custom MoE architectures
)

# Test Prompt
prompt = "Explain the concept of Quantum Computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs, 
        max_new_tokens=512, 
        temperature=0.7, 
        top_p=0.9,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
245
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Minh2508/Decode