YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Overview of Qwen3-ASR Models

The Qwen3-ASR models are sophisticated automatic speech recognition (ASR) systems designed to provide high-quality transcription and language identification across a wide range of languages and dialects. They are part of the Qwen3 family, which includes two main models: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B. Key Features

Language Support: Both models support 52 languages and dialects, including 30 languages and 22 Chinese dialects.

Architecture: They utilize a dual-module architecture: Audio Transformer (AuT) Encoder: Processes audio input to extract features. Language Model (LM) Decoder: Generates text output from the processed audio features.

Performance: The models achieve state-of-the-art results in various challenging conditions, such as:

  • Noisy environments
  • Singing voice recognition

Model Specifications Model Name Parameters Supported Languages Special Features Qwen3-ASR-1.7B 1.7 billion 52 High accuracy, robust in complex settings Qwen3-ASR-0.6B 0.6 billion 52 Efficient for lower resource environments

Inference Capabilities

  • Streaming and Offline Inference: Both models can handle real-time audio processing and can also work with pre-recorded audio.
  • Timestamp Prediction: The models can provide timestamps for words and sentences, which is useful for applications like subtitle generation.
Downloads last month
464
GGUF
Model size
0.9B params
Architecture
qwen3-asr
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support