NVFP4 / AWQ Quants or llm-compressor recipe
#1
by
stelterlab
- opened
As your blog post mentioned that you worked together with Red Hat AI using llm-compressor to create NVFP4 quants (for Mistral Large 3) - was this a special version of llm-compressor? I tried the good old mistral3_example.py on its little brother here, but got an key error: ministral3.
Will there be a newer version with support for ministral? Would love to use it with greater context length than 32k on my RTX 5090. π And at some point in NVFP4 on a DGX Spark.
And beside that: Keep up this great work! Viva la France! π Viva Mistral AI! π₯³