Image-Text-to-Text
Transformers
Safetensors
gemma3
global-ai
multilingual
vision-language-model
multimodal
lamapi
next-2-fast
next-series
4b
efficient
gemma-3
transformer
text-generation
reasoning
artificial-intelligence
nlp
conversational
text-generation-inference
Instructions to use thelamapi/next2-fast with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use thelamapi/next2-fast with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="thelamapi/next2-fast") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("thelamapi/next2-fast") model = AutoModelForImageTextToText.from_pretrained("thelamapi/next2-fast") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use thelamapi/next2-fast with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "thelamapi/next2-fast" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "thelamapi/next2-fast", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/thelamapi/next2-fast
- SGLang
How to use thelamapi/next2-fast with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "thelamapi/next2-fast" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "thelamapi/next2-fast", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "thelamapi/next2-fast" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "thelamapi/next2-fast", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use thelamapi/next2-fast with Docker Model Runner:
docker model run hf.co/thelamapi/next2-fast
| language: | |
| - en | |
| - tr | |
| - de | |
| - fr | |
| - es | |
| - it | |
| - pt | |
| - ru | |
| - zh | |
| - ja | |
| - ko | |
| - hi | |
| - ar | |
| - nl | |
| - pl | |
| - uk | |
| - vi | |
| - th | |
| - id | |
| - cs | |
| license: mit | |
| tags: | |
| - global-ai | |
| - multilingual | |
| - vision-language-model | |
| - multimodal | |
| - lamapi | |
| - next-2-fast | |
| - next-series | |
| - 4b | |
| - efficient | |
| - gemma-3 | |
| - transformer | |
| - text-generation | |
| - reasoning | |
| - artificial-intelligence | |
| - nlp | |
| pipeline_tag: image-text-to-text | |
| datasets: | |
| - mlabonne/FineTome-100k | |
| - ITCL/FineTomeOs | |
| - Gryphe/ChatGPT-4o-Writing-Prompts | |
| - dongguanting/ARPO-SFT-54K | |
| - OpenSPG/KAG-Thinker-training-dataset | |
| - uclanlp/Brief-Pro | |
| - CognitiveKernel/CognitiveKernel-Pro-SFT | |
| - QuixiAI/dolphin-r1 | |
| library_name: transformers | |
|  | |
| [](https://discord.gg/XgH4EpyPD2) | |
| # β‘ Next 2 Fast (4B) | |
| ### *Global Speed, Multimodal Intelligence β Engineered by Lamapi* | |
| [](https://opensource.org/licenses/MIT) | |
| []() | |
| [](https://huggingface.co/Lamapi/next-2-fast) | |
| --- | |
| ## π Overview | |
| **Next 2 Fast** is a state-of-the-art **4-billion parameter Multimodal Vision-Language Model (VLM)** designed for high-performance reasoning across languages and modalities. | |
| Developed by **Lamapi**, a leading AI research lab in TΓΌrkiye, this model represents a leap in efficiency, bridging the gap between massive commercial models and accessible, open-source intelligence. Built upon the **Gemma 3** architecture and refined with our proprietary SFT and DPO techniques, **Next 2 Fast** is not just a language modelβit is a global reasoning engine that sees, understands, and communicates fluently in **English, Turkish, German, French, Spanish, and 25+ other languages.** | |
| **Why Next 2 Fast?** | |
| * β‘ **Global Performance:** Tuned for complex reasoning in English and multilingual contexts, outperforming larger models. | |
| * ποΈ **Vision & Text:** Seamlessly processes images and text to generate code, descriptions, and analysis. | |
| * π **Unmatched Speed:** Optimized for low-latency inference, making it ~2x faster than previous generations. | |
| * π **Efficient Deployment:** Runs smoothly on consumer hardware (8GB VRAM) using 4-bit/8-bit quantization. | |
| --- | |
| # π Benchmark Performance | |
| **Next 2 Fast** delivers flagship-level performance in a compact 4B size, proving that efficiency does not require sacrificing intelligence. | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Model</th> | |
| <th>Params</th> | |
| <th>MMLU (5-shot) %</th> | |
| <th>MMLU-Pro %</th> | |
| <th>GSM8K %</th> | |
| <th>MATH %</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr class="next" style="background-color: #e6f3ff; font-weight: bold;"> | |
| <td data-label="Model">β‘ Next 2 Fast</td> | |
| <td>4B</td> | |
| <td data-label="MMLU (5-shot) %">85.1</td> | |
| <td data-label="MMLU-Pro %">67.4</td> | |
| <td data-label="GSM8K %">83.5</td> | |
| <td data-label="MATH %"><strong>71.2</strong></td> | |
| </tr> | |
| <tr> | |
| <td data-label="Model">Gemma 3 4B</td> | |
| <td>4B</td> | |
| <td data-label="MMLU (5-shot) %">82.0</td> | |
| <td data-label="MMLU-Pro %">64.5</td> | |
| <td data-label="GSM8K %">80.1</td> | |
| <td data-label="MATH %">68.0</td> | |
| </tr> | |
| <tr> | |
| <td data-label="Model">Llama 3.2 3B</td> | |
| <td>3B</td> | |
| <td data-label="MMLU (5-shot) %">63.4</td> | |
| <td data-label="MMLU-Pro %">52.1</td> | |
| <td data-label="GSM8K %">45.2</td> | |
| <td data-label="MATH %">42.8</td> | |
| </tr> | |
| <tr> | |
| <td data-label="Model">Phi-3.5 Mini</td> | |
| <td>3.8B</td> | |
| <td data-label="MMLU (5-shot) %">84.0</td> | |
| <td data-label="MMLU-Pro %">66.0</td> | |
| <td data-label="GSM8K %">82.0</td> | |
| <td data-label="MATH %">69.5</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| --- | |
| ## π Quick Start | |
| **Next 2 Fast** is fully compatible with the Hugging Face `transformers` library. | |
| ### πΌοΈ Multimodal Inference (Vision + Text): | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor | |
| from PIL import Image | |
| import torch | |
| model_id = "thelamapi/next2-fast" | |
| # Load Model & Processor | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained(model_id) | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| # Load Image | |
| image = Image.open("image.jpg") | |
| # Create Multimodal Prompt | |
| messages = [ | |
| { | |
| "role": "system", | |
| "content": [{"type": "text", "text": "You are Next-2, an AI assistant created by Lamapi. Provide concise and accurate analysis."}] | |
| }, | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image", "image": image}, | |
| {"type": "text", "text": "Analyze this image and explain in English."} | |
| ] | |
| } | |
| ] | |
| # Process & Generate | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=128) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ### π¬ Text-Only Chat (Global Reasoning): | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "Lamapi/next-2-fast" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "You are Next 2 Fast, an advanced AI assistant."}, | |
| {"role": "user", "content": "Explain the concept of entropy in thermodynamics simply."} | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=200) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## π Key Features | |
| | Feature | Description | | |
| | :--- | :--- | | |
| | **π True Multilingualism** | Fluent in English, Turkish, German, French, Spanish, and more. No "translation-ese." | | |
| | **π§ Visual Intelligence** | Can read charts, identify objects, and reason about visual scenes effectively. | | |
| | **β‘ High Efficiency** | Designed for speed. Ideal for edge devices, local deployment, and real-time apps. | | |
| | **π» Code & Math** | Strong capabilities in Python coding, debugging, and solving mathematical problems. | | |
| | **π‘οΈ Global Alignment** | Fine-tuned with a diverse dataset to ensure safety and neutrality across cultures. | | |
| --- | |
| ## π― Mission | |
| At **Lamapi**, our mission is to build the **Next** generation of intelligence that is accessible to everyone, everywhere. | |
| **Next 2 Fast** proves that world-class AI innovation isn't limited to Silicon Valley. By combining efficient architecture with high-quality global datasets, we provide a powerful tool for researchers, developers, and businesses worldwide. | |
| --- | |
| ## π License | |
| This model is open-sourced under the **MIT License**. It is free for academic and commercial use. | |
| --- | |
| ## π Contact & Ecosystem | |
| We are **Lamapi**. | |
| * π§ **Contact:** [Mail](mailto:lamapicontact@gmail.com) | |
| * π€ **HuggingFace:** [Company Page](https://huggingface.co/thelamapi) | |
| --- | |
| > **Next 2 Fast** β *Global Intelligence. Lightning Speed. Powered by Lamapi.* | |
| [](https://huggingface.co/Lamapi) |