ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL Paper • 2402.19446 • Published Feb 29, 2024
When is Realizability Sufficient for Off-Policy Reinforcement Learning? Paper • 2211.05311 • Published Nov 10, 2022