DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion Paper β’ 2510.20766 β’ Published Oct 23 β’ 34
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper β’ 2509.16990 β’ Published Sep 21 β’ 18
Story2Board: A Training-Free Approach for Expressive Storyboard Generation Paper β’ 2508.09983 β’ Published Aug 13 β’ 68
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation Paper β’ 2506.08570 β’ Published Jun 10 β’ 33
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation Paper β’ 2506.05062 β’ Published Jun 5 β’ 15
CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature Paper β’ 2505.20779 β’ Published May 27 β’ 15
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning Paper β’ 2505.17813 β’ Published May 23 β’ 57
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection Paper β’ 2505.19103 β’ Published May 25 β’ 13
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper β’ 2504.17502 β’ Published Apr 24 β’ 55
Scaling Analysis of Interleaved Speech-Text Language Models Paper β’ 2504.02398 β’ Published Apr 3 β’ 31
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper β’ 2504.01137 β’ Published Apr 1 β’ 21
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models Paper β’ 2503.18033 β’ Published Mar 23 β’ 27
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling Paper β’ 2503.09601 β’ Published Mar 12 β’ 16
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets β’ 7 items β’ Updated May 22 β’ 13