view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 Aug 14, 2024 • 74
BiFormer: Vision Transformer with Bi-Level Routing Attention Paper • 2303.08810 • Published Mar 15, 2023
RelayAttention for Efficient Large Language Model Serving with Long System Prompts Paper • 2402.14808 • Published Feb 22, 2024