NVFP4 / AWQ Quants or llm-compressor recipe

#1
by stelterlab - opened

As your blog post mentioned that you worked together with Red Hat AI using llm-compressor to create NVFP4 quants (for Mistral Large 3) - was this a special version of llm-compressor? I tried the good old mistral3_example.py on its little brother here, but got an key error: ministral3.

Will there be a newer version with support for ministral? Would love to use it with greater context length than 32k on my RTX 5090. πŸ™ƒ And at some point in NVFP4 on a DGX Spark.

And beside that: Keep up this great work! Viva la France! 😎 Viva Mistral AI! πŸ₯³

Sign up or log in to comment