Commit History

HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802

uvos commited on

HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd

uvos commited on

CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131)
1d24833

JohannesGaessler commited on

llama : add gpt-oss (llama/15091)
bf225d6

ggerganov ngxson HF Staff slaren commited on

HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945)
e37eff3

uvos commited on

HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (llama/14624)
5422b31

deepsek commited on

musa: upgrade musa sdk to rc4.2.0 (llama/14498)
a687ec3

yeahdongcn commited on

HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (llama/14634)
4354560

Slobodan Josic commited on

CUDA/HIP: Share the same unified memory allocation logic. (llama/12934)
143cb70

David Huang commited on

cuda : fix HIP and MUSA BF16 (llama/0)
6dc5583

ggerganov commited on

HIP: Add support for RDNA4 targets (llama/12372)
a73f01f

Slobodan Josic commited on

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
3a7ca19

Gaurav Garg JohannesGaessler commited on

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
1e69b8c

Gaurav Garg commited on

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)
ed08269

uvos commited on

CUDA: use mma PTX instructions for FlashAttention (llama/11583)
f328957

JohannesGaessler Diego Devesa commited on

hip : Add hipGraph and VMM support to ROCM (llama/11362)
089afa0

uvos commited on

CUDA: add BF16 support (llama/11093)
961ef57

JohannesGaessler commited on

Add some minimal optimizations for CDNA (llama/10498)
bf49bbe

uvos commited on

musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)
8ec75c3

R0CKSTAR commited on

ggml : fix builds (llama/0)
524a01b

ggerganov commited on

musa: remove Clang builtins mapping (llama/9421)
ba2469d

R0CKSTAR commited on

cuda : organize vendor-specific headers into vendors directory (llama/8746)
ec2f307

R0CKSTAR commited on