Using Stable Diffusion & LLMs with ROCm

Barrett Williamson

18 Feb 2024 • 3 min read

I don't use any OpenAI's, Google's, or Microsoft's AI tools. I don't like that the training data for a tool so powerful should be closed and held by large tech giants. For all we know all the data they use is stolen from copyright they don't own and we have no way to audit it. I prefer to use open source AI models that have auditable training data.

There is a problem. I have been using my CPU for AI workloads ever since I moved away from NVIDIA. The support for AMD in the open source AI space is not great. I always heard support for AMD was pretty dire but I figured I would give it another shot. I currently have a i9-12th Gen Desktop with an AMD 7800XT running Fedora Kinoite as my Operating System. I currently have all the ROCm packages/drivers installed on my system and now I just need to get some tools that use them.

Stable Diffusion

I previously used InvokeAI for all my stable diffusion needs as it was pretty easy to set up and seemed close to feature complete with AUTOMATIC1111's stable-diffusion-webui which I want to move away from. My first thought was to spin up an Arch Linux container and use that as my base for installing InvokeAI. Unfortunately, I found out the hard way that Arch Linux is too new for AI based work it seems. I decided to set up an Ubuntu 22.04 container as everything I read about AI online references an Ubuntu system. I'm using Podman with Distrobox for this but you should also be able to do the same in Docker as long as the container can access your GPU.

distrobox create ubuntu -i ubuntu:latest

Now just enter the container and install the requirements, download the InvokeAI Installer, and install it. As of today the newest version is 3.7.0.
https://github.com/invoke-ai/InvokeAI/releases/

distrobox enter ubuntu
sudo apt install python3.10-venv rocm*
cd InvokeAI-Installer
./install.sh

After you have installed it try to run ./invoke.sh from the directory you installed it to and look through the output of ./invoke.sh for the line [InvokeAI]::INFO --> GPU device. You want it to look something like this.

[InvokeAI]::INFO --> GPU device = cuda AMD Radeon Graphics

If it doesn't say cpu then we just need to try generating an image. The most likely thing that will happen is InvokeAI will crash as soon as it tries to generate an image. If it doesn't then your done! It works!

But for me it wasn't that simple. I needed to change an environmental variable for my GPU to work. I am currently using a 7800XT so for me I want to use HSA_OVERRIDE_GFX_VERSION=11.0.0. If you are using a 6000 series you may need to use HSA_OVERRIDE_GFX_VERSION=10.3.0 but I am not sure.

To test to see if this works we can run the following.

HSA_OVERRIDE_GFX_VERSION=11.0.0 ./invoke.sh

If all goes well then you should be able to generate images now! Now I don't know about you but I just want to run ./invoke.sh and be done with it. So we can just add export HSA_OVERRIDE_GFX_VERSION=11.0.0 somewhere at the beginning of the file.

And that's it! InvokeAI should be blazing fast thanks to proper ROCm support!

Large Language Models

I used to use GPT4All for all my LLM needs but I wanted to try something different. I recently just found Ollama which looks really cool, especially since they provide docker containers with ROCm all set up. All I had to do was run the following and I was up and running!

podman run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:0.1.24-rocm

The catch was it does seem to use my CPU more than InvokeAI does. I'm not sure if its a bug or just expected. When I read the logs of the container gives it says it doesnt look like anything is going wrong and the AI runs faster than before so Id say its working pretty great!

time=2024-02-18T14:10:47.663Z level=INFO source=gpu.go:242 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-18T14:10:47.664Z level=INFO source=gpu.go:288 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-18T14:10:47.666Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-18T14:10:47.666Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
...
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =  3647.87 MiB