view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 5 days ago • 44
Instruction Pretrained Experiments Collection Experiments associated with the paper 'Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study' • 3 items • Updated Dec 11, 2025 • 1
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 10 days ago • 87
MMFineReason Collection High-quality STEM reasoning dataset for Multimodal LLM post-training. • 14 items • Updated 6 days ago • 20
Continually pre-trained models Collection Language-specific LLMs continually pre-trained from fully open English base models • 2 items • Updated 19 days ago • 1
MS MARCO Mined Triplets Collection These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets. • 16 items • Updated 11 days ago • 13
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26, 2025 • 46
MetricX-24 Collection A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/) • 6 items • Updated Jul 10, 2025 • 11
Llama-Embed-Nemotron-8B Collection State-of-the-Art Text Embedding Model • 3 items • Updated 5 days ago • 4