AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24, 2025 • 90
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? Paper • 2509.03312 • Published Sep 3, 2025 • 5
Multi-Agent Evolve: LLM Self-Improve through Co-evolution Paper • 2510.23595 • Published Oct 27, 2025 • 11
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27, 2025 • 121
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 106
Where LLM Agents Fail and How They can Learn From Failures Paper • 2509.25370 • Published Sep 29, 2025 • 11
Where LLM Agents Fail and How They can Learn From Failures Paper • 2509.25370 • Published Sep 29, 2025 • 11 • 2
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents Paper • 2505.23559 • Published May 29, 2025 • 11
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents Paper • 2505.23559 • Published May 29, 2025 • 11 • 2
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3, 2025 • 29
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3, 2025 • 29 • 3
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published Feb 13, 2025 • 35