NVFP4 / AWQ Quants or llm-compressor recipe

by stelterlab - opened 8 days ago

8 days ago

As your blog post mentioned that you worked together with Red Hat AI using llm-compressor to create NVFP4 quants (for Mistral Large 3) - was this a special version of llm-compressor? I tried the good old mistral3_example.py on its little brother here, but got an key error: ministral3.

Will there be a newer version with support for ministral? Would love to use it with greater context length than 32k on my RTX 5090. 🙃 And at some point in NVFP4 on a DGX Spark.

And beside that: Keep up this great work! Viva la France! 😎 Viva Mistral AI! 🥳

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment