YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Overview of Qwen3-ASR Models

The Qwen3-ASR models are sophisticated automatic speech recognition (ASR) systems designed to provide high-quality transcription and language identification across a wide range of languages and dialects. They are part of the Qwen3 family, which includes two main models: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B. Key Features

Language Support: Both models support 52 languages and dialects, including 30 languages and 22 Chinese dialects.

Architecture: They utilize a dual-module architecture: Audio Transformer (AuT) Encoder: Processes audio input to extract features. Language Model (LM) Decoder: Generates text output from the processed audio features.

Performance: The models achieve state-of-the-art results in various challenging conditions, such as:

Noisy environments
Singing voice recognition

Model Specifications Model Name Parameters Supported Languages Special Features Qwen3-ASR-1.7B 1.7 billion 52 High accuracy, robust in complex settings Qwen3-ASR-0.6B 0.6 billion 52 Efficient for lower resource environments

Inference Capabilities

Streaming and Offline Inference: Both models can handle real-time audio processing and can also work with pre-recorded audio.
Timestamp Prediction: The models can provide timestamps for words and sentences, which is useful for applications like subtitle generation.

Downloads last month: 464

GGUF

Model size

0.9B params

Architecture

qwen3-asr

Hardware compatibility

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support