I think the data mixing should carry over to larger models, there is existing work from others that suggests this, e.g. - https://www.datologyai.com/blog/beyondweb
Asankhaya Sharma
·
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
commented on
their
article
about 18 hours ago
The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix
liked
a dataset
about 20 hours ago
introvoyz041/OpenEvolve
updated
a model
18 days ago
codelion/dhara-70m