If I were to build it myself I’d probably over complicate it by using multiple llm agents on a local server. Probably use whisper to do the speech to text and then Mistral fine tuned on the Rosetta code dataset to send the API calls to HA. However that wouldnt keep it from always listening to me and trying to interpret what I say into a command for HA. Is that just a prompting issue for whisper or would I need another agent to turn on whisper?
I could maybe get this to run without specialized hardware like a GPU but it would be better to have something for the llms to be a bit more responsive.
There is no LLM, it just used to recognize simple commands such as “turn on kitchen light”. What the “conversation agent” can do is very limited, though you can extend it to recognize custom commands. It’s not comparable to Google Assistant/Siri, let alone ChatGPT.
Ideally IMO you’d want a system with safeties in place. Like acceptable temperature ranges or durations for the oven to be on to avoid situations where the software misinterprets a command in a dangerous way.
Something like this:
User: Set temperature to 19 degrees. (Yeah it’s on the cold side even for Celsius, but not a crazy amount as room temperature is around 22 degrees)
Assistant: Setting temperature to 90 degrees. (Deadly in Celsius… Water boils at around 100 degrees, depending on pressure)
Assistant: 90 degrees is outside of the safe range defined by your configuration. Intrusion suspected. Deploying sentry guns.
Ok, hmm I wonder how much work it would be to implement it using open source models. I think the hardest part would be translating the voice instructions to an API call that HA can use correctly.
Then there is the whole hardware issue to fix too. I do know that some SOCs are getting good at running 7B parameter models locally but the cost is still probably going to be prohibitive.
So what is Home Assistant using for this?
If I were to build it myself I’d probably over complicate it by using multiple llm agents on a local server. Probably use whisper to do the speech to text and then Mistral fine tuned on the Rosetta code dataset to send the API calls to HA. However that wouldnt keep it from always listening to me and trying to interpret what I say into a command for HA. Is that just a prompting issue for whisper or would I need another agent to turn on whisper?
I could maybe get this to run without specialized hardware like a GPU but it would be better to have something for the llms to be a bit more responsive.
There is no LLM, it just used to recognize simple commands such as “turn on kitchen light”. What the “conversation agent” can do is very limited, though you can extend it to recognize custom commands. It’s not comparable to Google Assistant/Siri, let alone ChatGPT.
I believe there is a ChatGPT integration in the works (optional, of course)
If it runs locally, that’ll be awesome. I just hope it never decides to turn the heat up to 90F.
There’s plenty of local LLM options these days. It’s entirely feasible to run it in house.
And if someone can do it… I would suspect that there’ll be a HACS module up about 2 weeks ago…
Ideally IMO you’d want a system with safeties in place. Like acceptable temperature ranges or durations for the oven to be on to avoid situations where the software misinterprets a command in a dangerous way.
Something like this:
User: Set temperature to 19 degrees. (Yeah it’s on the cold side even for Celsius, but not a crazy amount as room temperature is around 22 degrees)
Assistant: Setting temperature to 90 degrees. (Deadly in Celsius… Water boils at around 100 degrees, depending on pressure)
Assistant: 90 degrees is outside of the safe range defined by your configuration. Intrusion suspected. Deploying sentry guns.
Good question - I have an allowed range configured on my thermostat but I don’t know if it applies to API calls or is just for the UI
Ok, hmm I wonder how much work it would be to implement it using open source models. I think the hardest part would be translating the voice instructions to an API call that HA can use correctly.
Then there is the whole hardware issue to fix too. I do know that some SOCs are getting good at running 7B parameter models locally but the cost is still probably going to be prohibitive.