Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing Paper • 2604.22782 • Published Apr 3 • 7
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 6 days ago • 201
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 16 days ago • 45
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published 22 days ago • 36
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 22 days ago • 25
Youssofal/MiniMax-M2.7-Abliterated-Heretic-GGUF Text Generation • 229B • Updated 22 days ago • 8.01k • 45
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper • 2604.14164 • Published Mar 23 • 35
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers 20 days ago • 68