YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Overview of Qwen3-ASR Models
The Qwen3-ASR models are sophisticated automatic speech recognition (ASR) systems designed to provide high-quality transcription and language identification across a wide range of languages and dialects. They are part of the Qwen3 family, which includes two main models: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B. Key Features
Language Support: Both models support 52 languages and dialects, including 30 languages and 22 Chinese dialects.
Architecture: They utilize a dual-module architecture: Audio Transformer (AuT) Encoder: Processes audio input to extract features. Language Model (LM) Decoder: Generates text output from the processed audio features.
Performance: The models achieve state-of-the-art results in various challenging conditions, such as:
- Noisy environments
- Singing voice recognition
Model Specifications Model Name Parameters Supported Languages Special Features Qwen3-ASR-1.7B 1.7 billion 52 High accuracy, robust in complex settings Qwen3-ASR-0.6B 0.6 billion 52 Efficient for lower resource environments
Inference Capabilities
- Streaming and Offline Inference: Both models can handle real-time audio processing and can also work with pre-recorded audio.
- Timestamp Prediction: The models can provide timestamps for words and sentences, which is useful for applications like subtitle generation.
- Downloads last month
- 464
8-bit
16-bit