When trying to run llama.cpp locally, I found that the instructions for building the docker image with vulkan acceleration doesn't work on my openSUSE Tumbleweed machine.
Instead, I needed to build and run the client directly on my host machine.
First, make sure both "vulkan-devel" and "shaderc" packages are installed.
Next, build it with vulkan
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. -DGGML_VULKAN=on -DCMAKE_BUILD_TYPE=Release
make
cd llama.cpp
mkdir build
cd build
cmake .. -DGGML_VULKAN=on -DCMAKE_BUILD_TYPE=Release
make
The client should detect and use GPU via vulkan library.
[~/work/llama.cpp/build/bin] $ ./llama-cli -m ../../models/Meta-Llama-3.1-8B-Instruct-Q5_K_L.gguf -p "Building a website can be done in 10 simple steps:" -n 600 -e -ngl 99
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
build: 4967 (f17a3bb4) with cc (SUSE Linux) 14.2.1 20250220 [revision 9ffecde121af883b60bbe60d00425036bc873048] for x86_64-suse-linux
main: llama backend init
......