text-embeddings-inference documentation
Using TEI locally with GPU
Getting started
Tutorials
Using TEI locally with CPUUsing TEI locally with MetalUsing TEI locally with GPUServing private and gated modelsBuild custom container for TEIUsing TEI container with Intel HardwareUsing TEI on AMD Instinct GPUs (ROCm)Example uses
Deploying TEI on Google Cloud
Reference
Using TEI locally with GPU
You can install text-embeddings-inference locally to run it on your own machine with a GPU.
To make sure that your hardware is supported, check out the Supported models and hardware page.
Step 1: CUDA and NVIDIA drivers
Make sure you have CUDA and the NVIDIA drivers installed - NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
Add the NVIDIA binaries to your path:
export PATH=$PATH:/usr/local/cuda/bin
Step 2: Install Rust
Install Rust on your machine by run the following in your terminal, then following the instructions:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Step 3: Install necessary packages
This step can take a while as we need to compile a lot of CUDA kernels.
For Turing GPUs (T4, RTX 2000 series … )
cargo install --path router -F candle-cuda-turing
For Ampere, Ada Lovelace, Hopper, and Blackwell
cargo install --path router -F candle-cuda
Step 4: Launch Text Embeddings Inference
You can now launch Text Embeddings Inference on GPU with:
model=Qwen/Qwen3-Embedding-0.6B text-embeddings-router --model-id $model --dtype float16 --port 8080