Building a Smart Home - from Voice Assistant to MajorDom v1.0

Original post

In 2019, I first learned about the possibility of speech recognition and synthesis in Python. Google Assistant, Siri, Cortana, and other assistants were even more limited and helpless than they are now. Adding custom voice commands was out of the question. That's when I became inspired to create my own voice assistant that would rival even Tony Stark's Jarvis.

While working on the core functionality, I started thinking about where to host this assistant. Keeping a laptop constantly powered on was not an option, and I didn't have any other computers available. That's when I discovered the Raspberry Pi single-board computers. I wanted my voice assistant to be able to control lights, LED strips, and curtains. Arduino boards were great for handling such tasks. The only thing left was to find a way to send commands to the Raspberry Pi. I didn't want to use Wi-Fi or Bluetooth right from the start. I found information online about the nRF24L01 modules, gave them a try, and liked the results.

This system worked quite well, but it had two key drawbacks:

The range was limited by the sensitivity of the microphone. With a good microphone, everything worked perfectly within a room, but not beyond.
For each parameter of each device, I had to add identical voice commands that only differed in the address and message. It was inconvenient, but tolerable for the time being.

To solve the first issue, I added http interface to the voice assistant using Django, which could accept an audio file or a string. In combination with a Kotlin mobile application, I created a wireless microphone, expanding the range to the router's coverage area, meaning from the room to the entire apartment and even slightly beyond. Carrying a phone around the home wasn't always convenient, so a few days later, I developed an application for my Wear OS smartwatch, which turned out to be an incredibly handy solution.

However, I desired more – access to my assistant anytime, not just at home. The simplest solution was a using a Telegram bot as the input-output interface. However, I couldn't shake the feeling that a bot wasn't quite right. I decided to keep it as a temporary solution until I had time to develop something better.

I wanted to be able to use my mobile application to access the assistant remotely. I just needed to find a way to send a request to the local Django server without being on the local network. I was ready to open and forward ports on my router, but my provider didn't give me a public IP address. That's when I tried ngrok. It worked well at first, but in the free version, the server occasionally crashed and changed its address. I quickly dismissed the VPN tunnel option. The cost of a VPS was equivalent to the ngrok subscription, but the implementation was much more complex.

Then I remembered that I had free hosting for PHP websites on Beget and reinvented Long Polling and queues. The implementation was straightforward: the application sent a request to the hosting server, where the PHP code added the request body (JSON) to the end of an array and wrote it to a local file. The Raspberry Pi at home sent a read request to this file every second, and after reading it, cleared the array. This way, I was able to send commands home from anywhere on the planet! Similarly, I set up receiving responses from the assistant: I duplicated the implementation and reversed the roles. Two files and four endpoints on the free PHP hosting provided me with stable bidirectional communication with my home assistant. Shortly after, I taught the assistant to send me messages on its own, such as the room number of the next class at the beginning of each break. Before I could boast about it in college, someone started spamming me at home. I had to add authentication: the login and password were hardcoded in the application, and the server had a check kind of this:

if ($login == 'markparker' && $password == 'MyVeryStrongP@ssw0rd!'){};

The application's repositories were private, and the server had no repository (why have a repository for just one file with less than 100 lines?), so the level of security was more than sufficient.

Shortly after, the system gained its first automatic command trigger. With the help of a small feature in my application, I was able to capture an event every time an alarm went off on my phone. This trigger launched the first full scenario: the curtains opened simultaneously, the assistant announced the time, weather, and class schedule at the college. If the room was still dark, the lamp smoothly turned on. At that moment, I felt like a real Tony Stark.

Then I wanted to add more automatic scenarios using motion sensors, presence detectors, lighting sensors, and so on. At this point, the second drawback I mentioned earlier became more noticeable. There was a lot of code duplication, and working with it became less convenient. The project only had the concept of voice commands; there was no notion of devices and triggers. That's when I realized how much my voice assistant had grown: I was already building a full-fledged smart home, not just a question-and-answer assistant.

This realization led me to the decision to separate the voice assistant and make the smart home a standalone project, focusing on device control rather than voice commands. And I decided to do it properly from the start, with a full-fledged server, databases, authentication, and a mobile application. Later, a college professor suggested to use web sockets instead of my php array-to-file workaround. This is how I implement remote control of devices over the Internet.

The overall concept remained unchanged: a hub in the form of a Raspberry Pi single-board computer controls Arduinos via the nRF24L01 radio module. I'll explain the architecture in more detail in the next article.