Spaces:
Running
Running
Commit History
HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)
1d24833
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)
5422b31
deepsek
commited on
musa: upgrade musa sdk to rc4.2.0 (llama/14498)
a687ec3
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634)
4354560
Slobodan Josic
commited on
CUDA/HIP: Share the same unified memory allocation logic. (llama/12934)
143cb70
David Huang
commited on
cuda : fix HIP and MUSA BF16 (llama/0)
6dc5583
HIP: Add support for RDNA4 targets (llama/12372)
a73f01f
Slobodan Josic
commited on
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c
Gaurav Garg
commited on
CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
ed08269
uvos
commited on
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957
hip : Add hipGraph and VMM support to ROCM (llama/11362)
089afa0
uvos
commited on
CUDA: add BF16 support (llama/11093)
961ef57
Add some minimal optimizations for CDNA (llama/10498)
bf49bbe
uvos
commited on
musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)
8ec75c3
R0CKSTAR
commited on
ggml : fix builds (llama/0)
524a01b
musa: remove Clang builtins mapping (llama/9421)
ba2469d
R0CKSTAR
commited on
cuda : organize vendor-specific headers into vendors directory (llama/8746)
ec2f307
R0CKSTAR
commited on