Spaces:
Running
Running
| # whisper.cpp for SYCL | |
| [Background](#background) | |
| [OS](#os) | |
| [Intel GPU](#intel-gpu) | |
| [Linux](#linux) | |
| [Environment Variable](#environment-variable) | |
| [Known Issue](#known-issue) | |
| [Todo](#todo) | |
| ## Background | |
| SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17. | |
| oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms. | |
| Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs. | |
| To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL. | |
| The whisper.cpp for SYCL is used to support Intel GPUs. | |
| For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build). | |
| ## OS | |
| |OS|Status|Verified| | |
| |-|-|-| | |
| |Linux|Support|Ubuntu 22.04| | |
| |Windows|Ongoing| | | |
| ## Intel GPU | |
| |Intel GPU| Status | Verified Model| | |
| |-|-|-| | |
| |Intel Data Center Max Series| Support| Max 1550| | |
| |Intel Data Center Flex Series| Support| Flex 170| | |
| |Intel Arc Series| Support| Arc 770| | |
| |Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake| | |
| |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7| | |
| ## Linux | |
| ### Setup Environment | |
| 1. Install Intel GPU driver. | |
| a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html). | |
| Note: for iGPU, please install the client GPU driver. | |
| b. Add user to group: video, render. | |
| ``` | |
| sudo usermod -aG render username | |
| sudo usermod -aG video username | |
| ``` | |
| Note: re-login to enable it. | |
| c. Check | |
| ``` | |
| sudo apt install clinfo | |
| sudo clinfo -l | |
| ``` | |
| Output (example): | |
| ``` | |
| Platform #0: Intel(R) OpenCL Graphics | |
| `-- Device #0: Intel(R) Arc(TM) A770 Graphics | |
| Platform #0: Intel(R) OpenCL HD Graphics | |
| `-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49] | |
| ``` | |
| 2. Install Intel® oneAPI Base toolkit. | |
| a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). | |
| Recommend to install to default folder: **/opt/intel/oneapi**. | |
| Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder. | |
| b. Check | |
| ``` | |
| source /opt/intel/oneapi/setvars.sh | |
| sycl-ls | |
| ``` | |
| There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**. | |
| Output (example): | |
| ``` | |
| [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000] | |
| [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000] | |
| [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50] | |
| [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918] | |
| ``` | |
| 2. Build locally: | |
| ``` | |
| mkdir -p build | |
| cd build | |
| source /opt/intel/oneapi/setvars.sh | |
| #for FP16 | |
| #cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON | |
| #for FP32 | |
| cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx | |
| #build example/main only | |
| #cmake --build . --config Release --target main | |
| #build all binary | |
| cmake --build . --config Release -v | |
| ``` | |
| or | |
| ``` | |
| ./examples/sycl/build.sh | |
| ``` | |
| Note: | |
| - By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only. | |
| ### Run | |
| 1. Put model file to folder **models** | |
| 2. Enable oneAPI running environment | |
| ``` | |
| source /opt/intel/oneapi/setvars.sh | |
| ``` | |
| 3. List device ID | |
| Run without parameter: | |
| ``` | |
| ./build/bin/ls-sycl-device | |
| or | |
| ./build/bin/main | |
| ``` | |
| Check the ID in startup log, like: | |
| ``` | |
| found 4 SYCL devices: | |
| Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3, | |
| max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 | |
| Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2, | |
| max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280 | |
| Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0, | |
| max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280 | |
| Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0, | |
| max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 | |
| ``` | |
| |Attribute|Note| | |
| |-|-| | |
| |compute capability 1.3|Level-zero running time, recommended | | |
| |compute capability 3.0|OpenCL running time, slower than level-zero in most cases| | |
| 4. Set device ID and execute whisper.cpp | |
| Set device ID = 0 by **GGML_SYCL_DEVICE=0** | |
| ``` | |
| GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav | |
| ``` | |
| or run by script: | |
| ``` | |
| ./examples/sycl/run_whisper.sh | |
| ``` | |
| 5. Check the device ID in output | |
| Like: | |
| ``` | |
| Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device | |
| ``` | |
| ## Environment Variable | |
| #### Build | |
| |Name|Value|Function| | |
| |-|-|-| | |
| |WHISPER_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, WHISPER_SYCL=ON is mandatory.| | |
| |WHISPER_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path.For FP32, do not set it.| | |
| |CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path| | |
| |CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path| | |
| #### Running | |
| |Name|Value|Function| | |
| |-|-|-| | |
| |GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output| | |
| |GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG| | |
| ## Known Issue | |
| - Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`. | |
| Miss to enable oneAPI running environment. | |
| Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`. | |
| - Hang during startup | |
| llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block. | |
| Solution: add **--no-mmap**. | |
| ## Todo | |
| - Support to build in Windows. | |
| - Support multiple cards. | |