Instructions to use yasu-oh/Llama-3-Swallow-Infused-R1776-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yasu-oh/Llama-3-Swallow-Infused-R1776-70B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="yasu-oh/Llama-3-Swallow-Infused-R1776-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("yasu-oh/Llama-3-Swallow-Infused-R1776-70B")
model = AutoModelForCausalLM.from_pretrained("yasu-oh/Llama-3-Swallow-Infused-R1776-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use yasu-oh/Llama-3-Swallow-Infused-R1776-70B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "yasu-oh/Llama-3-Swallow-Infused-R1776-70B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasu-oh/Llama-3-Swallow-Infused-R1776-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/yasu-oh/Llama-3-Swallow-Infused-R1776-70B

SGLang

How to use yasu-oh/Llama-3-Swallow-Infused-R1776-70B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "yasu-oh/Llama-3-Swallow-Infused-R1776-70B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasu-oh/Llama-3-Swallow-Infused-R1776-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "yasu-oh/Llama-3-Swallow-Infused-R1776-70B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yasu-oh/Llama-3-Swallow-Infused-R1776-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use yasu-oh/Llama-3-Swallow-Infused-R1776-70B with Docker Model Runner:
```
docker model run hf.co/yasu-oh/Llama-3-Swallow-Infused-R1776-70B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Llama-3-Swallow-Infused-R1776-70B

Overview

Llama-3-Swallow-Infused-R1776-70B is a 70B parameter merged model built on Meta's Llama 3 architecture. This model combines the distilled reasoning performance of r1-1776-distill-llama-70b with enhanced instruction-following capabilities from the Swallow model, making it particularly effective for both English and Japanese instruction tasks.

The foundation of this model leverages perplexity-ai/r1-1776-distill-llama-70b, a distilled model fine-tuned for reasoning tasks on top of Llama 3.3. To boost Japanese language proficiency and overall instruction alignment, we incorporated the ChatVector from tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4. This approach - adding an instruction-tuned model’s ChatVector to a reasoning-centric model - represents an innovative strategy to enhance the model's multilingual reasoning capabilities.

Merge Methodology

This model was created using a weighted linear merge:

Llama-3-Swallow-Infused-R1776-70B =
  r1-1776-distill-llama-70b + 0.4 * (
    Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
  )

Base: perplexity-ai/r1-1776-distill-llama-70b
- A distilled reasoning-focused model built on Meta Llama 3.3.
Delta: Difference between tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 and meta-llama/Llama-3.3-70B-Instruct.
Merge Tool: MergeKit
Scaling Factor: α = 0.4

Before merging, we performed vocabulary alignment to ensure consistency between the merged components. This step uses yasu-oh/merge_tools to align the vocabulary of the added model with the tokenizer of the base model. This preprocessing step prevents token mismatches and preserves high-quality performance across merged models.

This methodology ensures that the reasoning backbone of R1776 is retained while integrating Swallow's enhancements in instruction tuning and Japanese language support.

Languages

English
Japanese

Key Features

Bilingual support: robust performance for both English and Japanese tasks.
Enhanced reasoning and instruction-following capabilities.
Novel use of ChatVector addition from instruction-tuned models to a reasoning-centric base.

Recommended Parameters

temperature: 0.6
top_p: 0.95
top_k: 40
min_p: 0.0

License

This model is distributed under the Meta Llama 3 Community License. Please review and comply with its terms: https://www.llama.com/llama3/license/

Key Restrictions Include:

Do not use this model to improve competing large language models (LLMs).
When reusing this model, include the phrase: "Built with Meta Llama 3."
Organizations with more than 700 million monthly active users (MAU) require a separate license from Meta.
Model names must include “Llama 3”.

Citations

If you use this model, please cite the original works:

Downloads last month: 1

Safetensors

Model size

71B params

Tensor type

BF16

Model tree for yasu-oh/Llama-3-Swallow-Infused-R1776-70B

meta-llama/Llama-3.3-70B-Instruct

perplexity-ai/r1-1776-distill-llama-70b

tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4

Merge model

this model

Merges

1 model

Quantizations

3 models