interesting
updated
Describe Anything: Detailed Localized Image and Video Captioning
Paper
• 2504.16072
• Published
• 64
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City
Environment
Paper
• 2410.09604
• Published
Geospatial Mechanistic Interpretability of Large Language Models
Paper
• 2505.03368
• Published
• 12
Scenethesis: A Language and Vision Agentic Framework for 3D Scene
Generation
Paper
• 2505.02836
• Published
• 8
Constructing a 3D Town from a Single Image
Paper
• 2505.15765
• Published
• 24
SpatialScore: Towards Unified Evaluation for Multimodal Spatial
Understanding
Paper
• 2505.17012
• Published
• 12
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal
Large Language Models
Paper
• 2505.17015
• Published
• 9
Visual Embodied Brain: Let Multimodal Large Language Models See, Think,
and Control in Spaces
Paper
• 2506.00123
• Published
• 35
Point-MoE: Towards Cross-Domain Generalization in 3D Semantic
Segmentation via Mixture-of-Experts
Paper
• 2505.23926
• Published
• 5
TaskCraft: Automated Generation of Agentic Tasks
Paper
• 2506.10055
• Published
• 32
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper
• 2506.16406
• Published
• 131
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published
• 32
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Paper
• 2506.21656
• Published
• 16
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
Reconstructing 4D Spatial Intelligence: A Survey
Paper
• 2507.21045
• Published
• 38
Exploitation Is All You Need... for Exploration
Paper
• 2508.01287
• Published
• 7
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Paper
• 2507.23478
• Published
• 17
MolmoAct: Action Reasoning Models that can Reason in Space
Paper
• 2508.07917
• Published
• 44
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Paper
• 2508.13142
• Published
• 34
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Paper
• 2508.14879
• Published
• 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
• 2508.19247
• Published
• 43
Spacer: Towards Engineered Scientific Inspiration
Paper
• 2508.17661
• Published
• 32
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 231
Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from
Vector Drawings
Paper
• 2508.18733
• Published
• 10
Bootstrapping Task Spaces for Self-Improvement
Paper
• 2509.04575
• Published
• 6
3D Aware Region Prompted Vision Language Model
Paper
• 2509.13317
• Published
• 14
CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific
Tokenization
Paper
• 2509.21150
• Published
• 4
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals
Long-Range Dependency Pitfalls
Paper
• 2510.00184
• Published
• 17
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement
Learning
Paper
• 2510.03259
• Published
• 57
Watch and Learn: Learning to Use Computers from Online Videos
Paper
• 2510.04673
• Published
• 12
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification
Paper
• 2512.10756
• Published
• 35
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
Paper
• 2512.15687
• Published
• 21
When Reasoning Meets Its Laws
Paper
• 2512.17901
• Published
• 61
Paper
• 2512.16301
• Published
• 107
Evolving Programmatic Skill Networks
Paper
• 2601.03509
• Published
• 87
RynnBrain: Open Embodied Foundation Models
Paper
• 2602.14979
• Published
• 42