CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
Paper • 2605.16839 • Published • 12
Efficient AI
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
RelayGen: Intra-Generation Model Switching for Efficient Reasoning