Nano LLMs Collection Really small LLMs pre-trained on data efficient 1 B tokens • 3 items • Updated 1 day ago • 1
view article Article Scaling Pedagogical Pretraining: From Optimal Mixing to 10 Billion Tokens 1 day ago • 2
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 5 days ago • 11
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 213
view article Article Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models Jan 23 • 10
view article Article Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement Dec 3, 2025 • 14
Budget-Aware Tool-Use Enables Effective Agent Scaling Paper • 2511.17006 • Published Nov 21, 2025 • 33
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix Nov 3, 2025 • 61
Sutra Pedagogical Datasets Collection High-quality synthetic educational datasets designed for LLM pretraining with structured pedagogical content across 9 knowledge domains. • 6 items • Updated 1 day ago • 1
Dhara Foundational Models Collection Diffusion Language Models combining deep narrow networks, Canon layers (depthwise causal convolutions), and WSD (Warmup-Stable-Decay) training. • 2 items • Updated 1 day ago • 3
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 509
Mem-Agent Collection Small sized agents from Dria trained on interacting with an obsidian-like memory system using python tools. Trained on Qwen3-4B-Thinking-2507. • 4 items • Updated Sep 5, 2025 • 5
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14, 2025 • 60
view article Article mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL Sep 11, 2025 • 26
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 1 day ago • 103
view article Article Building Enterprise-Ready Text Classifiers in Minutes with Adaptive Learning Aug 9, 2025 • 12