Xoron-Dev-MultiMoe-GGUF ?
Hi,
Is a GGUF version planned for the near future ?
Thank you in advance.
Yes, I would love to add full GGUF support. However, the model cannot support it yet because of its architectural complexity. GGUF engines and tools like Llama.cpp, LM Studio, or Ollama would not fully support my architecture or the modifications I have made.
Can I ask why you chose GGUF? Is it for quantization, or for general model use with a server, API, or Ollama?
I want to use it with Ollama. Quantification is not a priority since it seems rather small. I would also like to know if you have considered using Qwen3-VL with mergekit. I saw that it was now supported.
Can I ask what you mean by using Qwen3-VL with Mergekit? I guess I could "frankenmerge" it, but my architecture is different from the Qwen architecture and has features Qwen doesn't have. If you want to use it with Ollama, I can do something about that.
My goal is a small LLM capable of outperforming frontier LLMS by using experimental methods; I am testing results until I find what I want. My model won't be finished for a while, as I am still training and redoing the architecture as I discover things and find ways to do things other models can't do effectively.
Recently, I saw information regarding "H-neurons" in models, which are what cause hallucinations. I am working on a feature for my model to effectively never hallucinate, even with high creativity.
Yes, I was talking about Frankenmerge. But if it's possible to use it with Ollama, that's enough for me.
My intention was never to hinder your desire to experiment and innovate.
Besides, I just saw several articles on H-neurons that are very interesting. It's definitely a problem that shouldn't be underestimated (https://canadiantechnologymagazine.com/h-neurons-llm-hallucinations-canadian-businesses/) (https://github.com/thunlp/H-Neurons). It would be really great if you succeed.
I'll continue to follow you, without putting any pressure on you. I promise.