aishiknagar
's Collections
RL and Agents
updated
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
•
2505.14146
•
Published
•
19
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
•
2505.19443
•
Published
•
15
ARM: Adaptive Reasoning Model
Paper
•
2505.20258
•
Published
•
45
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
•
2505.19914
•
Published
•
45
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
131
Active-O3: Empowering Multimodal Large Language Models with Active
Perception via GRPO
Paper
•
2505.21457
•
Published
•
15
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural
Language and Reinforcement Learning
Paper
•
2505.23754
•
Published
•
15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
143
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
187
Resa: Transparent Reasoning Models via SAEs
Paper
•
2506.09967
•
Published
•
21
Reasoning with Exploration: An Entropy Perspective
Paper
•
2506.14758
•
Published
•
30
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
•
2506.13585
•
Published
•
273
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain
Perspective
Paper
•
2506.14965
•
Published
•
49
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
•
2506.15211
•
Published
•
38
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
•
2507.10532
•
Published
•
89
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems
at Once
Paper
•
2507.10541
•
Published
•
29
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
•
2507.06261
•
Published
•
64
LLMalMorph: On The Feasibility of Generating Variant Malware using
Large-Language-Models
Paper
•
2507.09411
•
Published
•
3
The Imitation Game: Turing Machine Imitator is Length Generalizable
Reasoner
Paper
•
2507.13332
•
Published
•
48