Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives Paper • 2305.08088 • Published May 14, 2023 • 1
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis Paper • 2407.12857 • Published Jul 9, 2024 • 2
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook Paper • 2509.14142 • Published Sep 17, 2025 • 10
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 22
MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models Paper • 2509.14651 • Published Sep 18, 2025
TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training Paper • 2603.01714 • Published Mar 2 • 1
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models Paper • 2310.05074 • Published Oct 8, 2023 • 1
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 103
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection Paper • 2605.28534 • Published May 27 • 23
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning Paper • 2605.28424 • Published May 27 • 32
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond Paper • 2403.14734 • Published Mar 21, 2024 • 21
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12, 2024 • 46