Can you also make one for the captioner?

by Perpetuity7 - opened Oct 14, 2025

Oct 14, 2025

https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Captioner

I’d really appreciate it if you could make it.

Additionally, I hope you could also extract the vision transformer.
https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct

mifanbushipeicai

Nov 30, 2025

Hi bro, do you find captioner encoder now?

Atotti

Owner Dec 1, 2025

As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?

mifanbushipeicai

Dec 1, 2025

As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?

Hey, I think there’s one thing you’re missing: the Captioner checkpoint went through a post-training full-parameter fine-tuning stage. Even though this fine-tuning was done jointly with the rest of the model rather than on the encoder alone, we can still reasonably treat the Captioner encoder as a stronger, more general-purpose audio representation model.

mifanbushipeicai

Dec 2, 2025

As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?

I’m working on a paper comparing different audio encoders. Would it be possible for you to provide a standalone encoder checkpoint for the Captioner model, or some guidance / code on how to extract it? It would be extremely helpful for my research and would save a lot of time. Many thanks in advance for your help!

Best regards,
mifanbushipeicai

Atotti

Owner Dec 3, 2025

@mifanbushipeicai

Got it.
Please wait a moment while I get things ready.

mifanbushipeicai

about 1 month ago

非常感谢，并且期待着

@mifanbushipeicai

Got it.
Please wait a moment while I get things ready.

Thanks a lot, really looking forward to it!

Atotti

Owner 29 days ago

@mifanbushipeicai

I've uploaded my collection here. The inference code is provisional, so some may not work.

https://huggingface.co/collections/Atotti/alm-audio-encoders

mifanbushipeicai

29 days ago

@mifanbushipeicai

I've uploaded my collection here. The inference code is provisional, so some may not work.

https://huggingface.co/collections/Atotti/alm-audio-encoders

Thank you a lot !!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment