Solar-Open-100B AWQ - INT8

Model Details

Quantization Details

Memory Usage

Type Solar-Open-100B Solar-Open-100B-AWQ-8bit
Memory Size 191.2 GB 103.7 GB
KV Cache per Token 96.0 kB 48.0 kB
KV Cache per Context 12.0 GB 6.0 GB

Inference

Prerequisite

VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/[email protected]

Basic Usage

vllm serve cyankiwi/Solar-Open-100B-AWQ-8bit \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --trust-remote-code

Additional Information

Changelog

  • v1.0.0 - Initial quantized release

Authors

Solar Open Model

Solar Open

Solar Open is Upstage's flagship 102B-parameter large language model, trained entirely from scratch and released under the Solar-Apache License 2.0 (see LICENSE for details). As a Mixture-of-Experts (MoE) architecture, it delivers enterprise-grade performance in reasoning, instruction-following, and agentic capabilities鈥攁ll while prioritizing transparency and customization for the open-source community.

Highlights

  • MoE Architecture (102B / 12B): Built on a Mixture-of-Experts architecture with 102B total / 12B active parameters. This design delivers the knowledge depth of a massive model with the inference speed and cost-efficiency of a much smaller model.
  • Massive Training Scale: Pre-trained on 19.7 trillion tokens, ensuring broad knowledge coverage and robust reasoning capabilities across various domains.

Model Overview

  • Model Name: Solar Open 100B
  • Hugging Face ID: Upstage/Solar-Open-100B
  • Architecture: Mixture-of-Experts (MoE)
    • Total Parameters: 102.6B
    • Active Parameters: 12B (per token)
    • Experts: 129 Experts (top 8 among 128 Routed + 1 Shared)
  • Pre-training Tokens: 19.7 Trillion
  • Context Length: 128k
  • Training Hardware: NVIDIA B200 GPUs
  • License: Solar-Apache License 2.0 (See LICENSE)
  • Hardware Requirements:
    • Minimum: 4x NVIDIA A100 (80GB)

License

This repository contains both model weights and code, which are licensed under different terms:

  1. MODEL WEIGHTS (*.safetensors) Licensed under Solar-Apache License 2.0 See: https://huggingface.co/upstage/Solar-Open-100B/blob/main/LICENSE

  2. CODE (*.py, *.json, *.jinja files) Licensed under Apache License 2.0 See: https://www.apache.org/licenses/LICENSE-2.0

Performance

TBA

Inference Quickstart

We recommend using the following generation parameters:

temperature=0.8
top_p=0.95
top_k=50

Transformers

Install the required dependencies:

pip install -U transformers kernels torch accelerate

Run inference with the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "upstage/Solar-Open-100B"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Prepare input
messages = [{"role": "user", "content": "who are you?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device)

# Generate response
generated_ids = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.8,
    top_p=0.95,
    top_k=50,
    do_sample=True,
)
generated_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1] :])
print(generated_text)

vLLM

Option 1: Using Docker (Highly Recommended)

Docker is the recommended deployment method for running Solar-Open-100B.

# For 8 GPUs
docker run --gpus all \
    --ipc=host \
    -p 8000:8000 \
    upstage/vllm-solar-open:latest \
    upstage/Solar-Open-100B \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --tensor-parallel-size 8

Option 2: Installing from Source

For development, debugging, custom modifications or offline inference, Solar Open can also be run using a source installation of vLLM. We recommend using uv for environment management and dependency resolution.

Create and activate a Python virtual environment

uv venv --python 3.12 --seed
source .venv/bin/activate

Install Solar Open's optimized vLLM

VLLM_PRECOMPILED_WHEEL_LOCATION="https://github.com/vllm-project/vllm/releases/download/v0.12.0/vllm-0.12.0-cp38-abi3-manylinux_2_31_x86_64.whl" \
VLLM_USE_PRECOMPILED=1 \
uv pip install git+https://github.com/UpstageAI/[email protected]

Start the vLLM server (For 8 GPUs)

vllm serve upstage/Solar-Open-100B \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser solar_open \
    --reasoning-parser solar_open \
    --logits-processors vllm.model_executor.models.parallel_tool_call_logits_processor:ParallelToolCallLogitsProcessor \
    --logits-processors vllm.model_executor.models.solar_open_logits_processor:SolarOpenTemplateLogitsProcessor \
    --tensor-parallel-size 8

Public API Access

The official API service for Solar Open is scheduled to launch publicly on January.

  • Access: Upstage Console (TBA)
  • Documentation: Upstage Console (TBA)

Citation

If you use Solar Open in your research, please cite:

@misc{solar-open-2025,
    title={Solar Open: Scaling Upstage's LLM Capabilities with MoE},
    author={Upstage AI},
    year={2025},
    url={https://huggingface.co/Upstage/Solar-Open-100B}
}
Downloads last month
80
Safetensors
Model size
31B params
Tensor type
BF16
I64
F32
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for cyankiwi/Solar-Open-100B-AWQ-8bit

Quantized
(12)
this model