view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 philschmid, osanseviero, alvarobartt, lvwerra, dvilasuero, reach-vb, marcsun13, pcuenq • Jul 23, 2024 • 241
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 458
view article Article How NuminaMath Won the 1st AIMO Progress Prize +6 yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif • Jul 11, 2024 • 128
view article Article Welcome Gemma 2 - Google’s new open LLM +4 philschmid, osanseviero, pcuenq, lewtun, tomaarsen, reach-vb • Jun 27, 2024 • 132
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.14k
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 96
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11, 2024 • 40
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 59