Shots, Screenshots, & More Shots

TL;DR I have rough documentation on how to copy my setup here: https://docs.taylorcohron.me/en/showcase/home-ai

Like many tech enthusiasts, for the longest time I had Google Home devices in my apartment, hooked into Home Assistant to control my lights and other smart devices. As the years passed and my own paranoia regarding Google’s microphone access to my life grew, I wanted an alternative.

Then, Nabu Casa (the group that primarily drives development and maintenance on Home Assistant) revealed and released the Home Assistant Voice (Preview Edition). Despite the long name, the little device was a simple, ESP32 device with an onboard microphone and DAC, able to do wake word listening and output via onboard speakers or an AUX jack. It was perfect, so I ordered 2. They showed up, I spent about 5 hours being frustrated trying to get Wyoming Protocol and a local LLM model to run on my Raspberry Pi 5, and promptly shut the PEs off and walked away to other projects.

Well, I finally decided to make it work. With a lot more experience under my belt this time, I realized the better way to deploy things would be to split the work between 3 places:

My Home Assistant instance, running on a Raspberry Pi 5. (Takes care of the Home Assistant part)
My unRAID server, running w/ a 13900 CPU, 128GB RAM, and an RTX 2070 SUPER (Runs Whisper, Piper, and llama.cpp)
The Home Assistant Voice PE devices. (Takes care of wake word and microphone input)

I came to this split as I realized that even a rather beefy 16GB RAM Raspberry Pi 5 can’t handle running Whisper, Piper, a local LLM, and all the other services I have in Home Assistant simultaneously. Running Whisper, Piper, and the LLM on my unRAID server freed up the limited resources on the Pi. A bonus is that with the LLM running on unRAID, I could have it open to other services in my homelab to use if I wanted. (Haven’t done that yet, but I like the option.) The Home Assistant Voice boxes handle wake words on their own, so no benefit to moving that service anywhere else.

With my designation of duties determined, I now needed to implement things. I decided to keep the llama.cpp server on unRAID its own docker container, separate from the Wyoming Protocol. Mostly this allows for other services to access it as I mentioned earlier, but also for me to swap and reload models if I see fit without interrupting the Whisper/Piper flow.

On unRAID, the easiest way to setup a llama.cpp server is to use the Community Apps plugin and install “llama.cpp” by grtgbln. It’s an unRAID template for the ghcr.io/ggml-org/llama.cpp Docker image.

Prior to installation though, you’ll need to create the appdata folder and download a model into it. Creating the folder is simple enough:

mkdir -p /mnt/user/appdata/llama_cpp/models

After that, you need a model to stick in that folder. I went with Qwen2.5-7B-Instruct, Q4_K_M which is roughly ~4.7GB, perfect for the RTX 2070 SUPER to run in VRAM with headroom. If you have even less VRAM on your GPU, or you just want lower latency, the Qwen2.5-3B-Instruct, Q4_K_M model is even smaller at roughly ~2GB. You can download them via an app like LM Studio and then use FTP to copy it to the folder above on unRAID. Make sure to rename the file to model.gguf in unRAID once it’s been copied. llama.cpp will only read a file with that name. If you’re referencing the docs I shared at the top of the post, you can use the unRAID plugin User Scripts to add the pull-model.sh file and run it, it will automatically pull down the Qwen2.5-7B model. You can always edit the script to download whichever model you prefer.

Once the model’s downloaded and renamed, fire up llama.cpp, should be the last you need to touch it at this point. Next I had get my Wyoming stack up. I opted to use Docker Compose for this, although I suppose you could just run the containers directly via Community Apps in unRAID. I digress. I used the lovely dockge interface to add a new compose stack in unRAID (the one shared in the docs above). After adding and saving the stack, I hit “Start” and let it pull the images and start the 2 containers.

With that, all the unRAID based setup was done. Now it was time for Home Assistant itself. I had a couple pre-requisites to take care of. I needed to install HASS Local OpenAI LLM via HACS so that I could add my llama.cpp instance on unRAID to Home Assistant. After installing the add-on and restarting Home Assistant, I was able to add it under Integrations. There I just selected the type of server as llama.cpp, set the Base URL to http://IP:PORT/v1, and left the API Key empty. Llama.cpp is now connected to Home Assistant! But wait, there’s more!

Now I had to go into the Local OpenAI LLM integration and add a conversation agent. Here you just need to leave the default selections for most everything, although I like to check the “Strip emojis from response” option. Since this is primarily a voice assistant, this makes spoken responses a bit cleaner.

Next, we need to add another Integration, the Wyoming Protocol. There’s a decent chance that your Home Assistant instance automatically picks up Whisper and Piper automatically, but if not, just go to Add Integration > Wyoming Protocol and then add it twice, your unRAID IP both times, and port 10300 for one and port 10200 for the other. You should now see Whisper and Piper.

The last step in Home Assistant is to ‘glue’ it all together in Settings > Voice Assistants. Make a new assistant and name it whatever you want. For Conversation Agent, you’ll pick the name of your llama.cpp server. I leave the “Prefer handling commands locally” on since certain commands Home Assistant can handle on its own just fine, but you can turn this off if you want to pipe everything to llama.cpp, just keep in mind you may see a bit more latency. Next is the “Speech-to-text” and “Text-to-speech” sections. You should just select Whisper and Piper respectively. You can select your appropriate language, and try out the different voices to pick one you like. I’m in the US, so I did en_US and use the lessac (medium) voice. Once you’ve got everything picked out, simply hit “Create”.

If this is the only Assistant you have, your Home Assistant Voice devices should default to it, but if not, you may need to set it as the preferred assistant under Devices > Home Assistant Voice PE > Configuration > Assistant.

Once I’d gone through all of this, I was able to instantly start using my 2 Voice devices. The default wake word is “hey nabu” but you can change it in the Devices / Configuration panel I mentioned before, personally, I set mine to “Hey Jarvis.”

Now with it setup, I just need to push it and see how far I can take it. Maybe a future post about getting it working smoothly with Music Assistant? Either way, thanks for reading through this rambling tutorial and hopefully you got something out of it.

Blog

A Local Voice