Collections
Discover the best community collections!
Collections including paper arxiv:2505.19297
-
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 155 -
Antidistillation Sampling
Paper • 2504.13146 • Published • 59
-
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Paper • 2412.20800 • Published • 11 -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper • 2501.06751 • Published • 32 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 -
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper • 2501.09755 • Published • 35
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 43
-
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper • 2505.19297 • Published • 84 -
yandex/alchemist
Viewer • Updated • 3.35k • 171 • 48 -
yandex/stable-diffusion-3.5-large-alchemist
Text-to-Image • Updated • 7 • 9 -
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 2 • 6
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 55 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
Alchemist: Turning Public Text-to-Image Data into Generative Gold
Paper • 2505.19297 • Published • 84 -
yandex/alchemist
Viewer • Updated • 3.35k • 171 • 48 -
yandex/stable-diffusion-3.5-large-alchemist
Text-to-Image • Updated • 7 • 9 -
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 2 • 6
-
One-Minute Video Generation with Test-Time Training
Paper • 2504.05298 • Published • 110 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 138 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 155 -
Antidistillation Sampling
Paper • 2504.13146 • Published • 59
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Paper • 2412.20800 • Published • 11 -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper • 2501.06751 • Published • 32 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 71 -
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper • 2501.09755 • Published • 35
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 55 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 52 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 56 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 71 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 43