Interés
updated
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
Reinforcement Learning
Paper
• 2411.02337
• Published
• 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
• 2411.04996
• Published
• 50
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
• 2411.03562
• Published
• 69
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
• 2410.08815
• Published
• 47
Game-theoretic LLM: Agent Workflow for Negotiation Games
Paper
• 2411.05990
• Published
• 8
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
Language Models on Mobile Devices
Paper
• 2411.10640
• Published
• 46
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
• 2411.19146
• Published
• 17
Snowflake/snowflake-arctic-embed-m-v2.0
Sentence Similarity
• Updated
• 74.3k
• 101
Snowflake/snowflake-arctic-embed-l-v2.0
Sentence Similarity
• Updated
• 1.34M
• • 231
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
Paper
• 2412.04862
• Published
• 50
ruliad/deepthought-8b-llama-v0.01-alpha
Text Generation
• Updated
• 7
• 146
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
• 2411.19943
• Published
• 62
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on
Retrieval-Augmented Generation
Paper
• 2412.02592
• Published
• 24
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
• 2412.05718
• Published
• 4
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
Retrieval-Augmented Generation
Paper
• 2412.10704
• Published
• 16
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
Generation for Preference Alignment
Paper
• 2412.13746
• Published
• 9
Wonderful Matrices: Combining for a More Efficient and Effective
Foundation Model Architecture
Paper
• 2412.11834
• Published
• 8
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation
Model Internet Agents
Paper
• 2412.13194
• Published
• 12
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper
• 2412.14711
• Published
• 16
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
• 2412.15797
• Published
• 18
Progressive Multimodal Reasoning via Active Retrieval
Paper
• 2412.14835
• Published
• 73
MixLLM: LLM Quantization with Global Mixed-precision between
Output-features and Highly-efficient System Design
Paper
• 2412.14590
• Published
• 15
Learned Compression for Compressed Learning
Paper
• 2412.09405
• Published
• 13
Token-Budget-Aware LLM Reasoning
Paper
• 2412.18547
• Published
• 46
ericsonwillians/distilbert-base-uncased-steam-sentiment
Text Classification
• 67M • Updated
• 30
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
• 2412.18319
• Published
• 39
Personalized Graph-Based Retrieval for Large Language Models
Paper
• 2501.02157
• Published
• 31
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published
• 107
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper
• 2501.04652
• Published
• 10
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper
• 2501.02576
• Published
• 15
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published
• 104
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
• 2501.03226
• Published
• 43
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Paper
• 2501.08617
• Published
• 10
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
• 2501.09012
• Published
• 10
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
• 2501.06590
• Published
• 11
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
• 2501.01257
• Published
• 51
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published
• 15
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Paper
• 2501.10979
• Published
• 6
Autonomy-of-Experts Models
Paper
• 2501.13074
• Published
• 44
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
• 2501.11110
• Published
• 4
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
• 2412.09078
• Published
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
• 2412.20372
• Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
• 2412.08024
• Published
• 1
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
• 2501.02152
• Published
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
Self-supervised Quantized Representation for Seamlessly Integrating
Knowledge Graphs with Large Language Models
Paper
• 2501.18119
• Published
• 25
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published
• 40
The Differences Between Direct Alignment Algorithms are a Blur
Paper
• 2502.01237
• Published
• 113
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
• 2501.13200
• Published
• 69
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
• 2502.01081
• Published
• 13
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
• 2502.05664
• Published
• 24
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
• 2502.06060
• Published
• 38
Paper
• 2502.06049
• Published
• 31
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper
• 2502.04416
• Published
• 12
Goku: Flow Based Video Generative Foundation Models
Paper
• 2502.04896
• Published
• 106
In-Context Retrieval-Augmented Language Models
Paper
• 2302.00083
• Published
• 1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Paper
• 2502.16614
• Published
• 27
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
• 2502.19361
• Published
• 28
STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task
Planning
Paper
• 2502.10177
• Published
• 6
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem
Proving
Paper
• 2502.07640
• Published
• 9
LoRACode: LoRA Adapters for Code Embeddings
Paper
• 2503.05315
• Published
• 13
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published
• 18
docling-project/SmolDocling-256M-preview
Image-Text-to-Text
• Updated
• 62.6k
• 1.61k
Bridging Continuous and Discrete Tokens for Autoregressive Visual
Generation
Paper
• 2503.16430
• Published
• 34
MAPS: A Multi-Agent Framework Based on Big Seven Personality and
Socratic Guidance for Multimodal Scientific Problem Solving
Paper
• 2503.16905
• Published
• 54
Improving Autoregressive Image Generation through Coarse-to-Fine Token
Prediction
Paper
• 2503.16194
• Published
• 8
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper
• 2503.15055
• Published
• 6
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
• 2503.16219
• Published
• 52
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Paper
• 2503.16356
• Published
• 15
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement
Learning
Paper
• 2503.15265
• Published
• 46
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
Advancing Language Model Reasoning through Reinforcement Learning and
Inference Scaling
Paper
• 2501.11651
• Published
• 1
API Agents vs. GUI Agents: Divergence and Convergence
Paper
• 2503.11069
• Published
• 36
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published
• 30
Self-Evolved Preference Optimization for Enhancing Mathematical
Reasoning in Small Language Models
Paper
• 2503.04813
• Published
• 2
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise
Rewards for Mathematical Reasoning
Paper
• 2502.14356
• Published
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
• 2502.02508
• Published
• 22
NousResearch/DeepHermes-3-Mistral-24B-Preview
Text Generation
• Updated
• 242
• 121
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
• 2503.08525
• Published
• 17
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published
• 53
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
• 2503.19855
• Published
• 29
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
• 2502.01618
• Published
• 10
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language
Model Born from Transformer
Paper
• 2501.15570
• Published
• 25
open-thoughts/OpenThoughts-114k
Viewer
• Updated
• 228k • 67.9k
• 809
Beyond Prompt Content: Enhancing LLM Performance via Content-Format
Integrated Prompt Optimization
Paper
• 2502.04295
• Published
• 13
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
• 2504.00824
• Published
• 43
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Paper
• 2504.02507
• Published
• 88
agentica-org/DeepCoder-14B-Preview
Text Generation
• Updated
• 351
• • 680
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper
• 2504.06261
• Published
• 110
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published
• 26
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published
• 33
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
• 2504.03601
• Published
• 17
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
• 2504.08736
• Published
• 46
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published
• 19
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper
• 2502.01142
• Published
• 24
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question
Answering?
Paper
• 2502.13233
• Published
• 15
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
• 2502.13922
• Published
• 27
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Paper
• 2502.08047
• Published
• 28
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published
• 61
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
• 2505.04842
• Published
• 12
Benchmarking LLMs' Swarm intelligence
Paper
• 2505.04364
• Published
• 20
Are Reasoning Models More Prone to Hallucination?
Paper
• 2505.23646
• Published
• 24
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data
Could Be Secretly Stolen!
Paper
• 2505.15656
• Published
• 15
This Time is Different: An Observability Perspective on Time Series
Foundation Models
Paper
• 2505.14766
• Published
• 40
ReCIT: Reconstructing Full Private Data from Gradient in
Parameter-Efficient Fine-Tuning of Large Language Models
Paper
• 2504.20570
• Published
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning
Capabilities Through Evaluation Design
Paper
• 2506.04734
• Published
• 21
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive
Foundations for Artificial General Intelligence and its Societal Impact
Paper
• 2507.00951
• Published
• 24
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM
Fine-Tuning Data from Unstructured Documents
Paper
• 2507.04009
• Published
• 53
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published
• 90
LayerCake: Token-Aware Contrastive Decoding within Large Language Model
Layers
Paper
• 2507.04404
• Published
• 22
IntFold: A Controllable Foundation Model for General and Specialized
Biomolecular Structure Prediction
Paper
• 2507.02025
• Published
• 35
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos
with Spatio-Temporal Diffusion Models
Paper
• 2507.13344
• Published
• 59
Speed Always Wins: A Survey on Efficient Architectures for Large
Language Models
Paper
• 2508.09834
• Published
• 53
Test-Time Scaling in Reasoning Models Is Not Effective for
Knowledge-Intensive Tasks Yet
Paper
• 2509.06861
• Published
• 9
Locality in Image Diffusion Models Emerges from Data Statistics
Paper
• 2509.09672
• Published
• 13
Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from
Token and Parameter Levels
Paper
• 2509.16596
• Published
• 14
Self-Improvement in Multimodal Large Language Models: A Survey
Paper
• 2510.02665
• Published
• 21