Instructions to use mattshumer/ref_70_e3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mattshumer/ref_70_e3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mattshumer/ref_70_e3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mattshumer/ref_70_e3")
model = AutoModelForCausalLM.from_pretrained("mattshumer/ref_70_e3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mattshumer/ref_70_e3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mattshumer/ref_70_e3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mattshumer/ref_70_e3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mattshumer/ref_70_e3

SGLang

How to use mattshumer/ref_70_e3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mattshumer/ref_70_e3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mattshumer/ref_70_e3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mattshumer/ref_70_e3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mattshumer/ref_70_e3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mattshumer/ref_70_e3 with Docker Model Runner:
```
docker model run hf.co/mattshumer/ref_70_e3
```

Reflection-Llama-3.1-70B was sonnet 3.5.

by Enigrand - opened Sep 9, 2024

Discussion

Enigrand

Sep 9, 2024

https://www.reddit.com/r/LocalLLaMA/comments/1fc9lf4/openrouter_reflection_70b_claims_to_be_claude/

https://www.reddit.com/r/LocalLLaMA/comments/1fc98fu/confirmed_reflection_70bs_official_api_is_sonnet/

https://www.reddit.com/r/LocalLLaMA/comments/1fc7avd/reflection_api_is_a_sonnet_35_wrapper_with_prompt/

nisten

Sep 9, 2024

Why is no one on local-llama actually runnign the f**king thing locally and posting their results?
Also feel free to apply my chat template fix before converting to gguf.

As for the threads on openrouter, that looks to my like it was the openrouter model-of-the-week api answering. My local tests on q8_0 showed this:

Enigrand

Sep 9, 2024

•

edited Sep 9, 2024

@nisten

Can you read titles of these posts? They're talking about thees official "reflection 70b APIs". Do you test on these APIs?

I see you're posting the results related to this twitter.

Do you really understand what he is trying to prove? He's trying to prove that the LLM behind "reflection 70b API" is using the same tokenizer as claude 3, chatgpt4o or whatever. Images he posted stands by his point.

What are you trying to prove here by posting this image? I think you're proving that what they uploaded here and what they host after API are totally different. You should explain what you want to prove in detail.

Also, I see you're using local models, so you're testing different models from all these posts claims. A natural question is that can you reproduce the evaluation results @mattshumer provided? Why not post your independent evaluation results here so you can help everyone decide whether they're genuine or overclaiming?