MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles
Paper • 2510.00483 • Published
None defined yet.
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery