One of the leading items on embedded developers’ to-do lists these days is to add Amazon’s Alexa voice agent to a hacker board or another Linux device. Of course, you could simply buy an Alexa-enabled Amazon Echo speaker system for $180 — or a non-speaker Amazon Echo Dot for only $50 — but what fun is that? For some, the goal is to recreate the basic Alexa Skills of ordering pizza or asking random question like which countries were finalists in the 2014 World Cup. Others want to go a step further and use Alexa to voice activate robots, smart home devices, dashboard interfaces, and other gizmos using technologies like MQTT. From a hacking perspective, the first stage is easy, said PTR Group CTO and Chief Scientist Mike Anderson at Embedded Linux Conference 2017 in February.
The main challenge, Anderson explained in “Making an Amazon Echo Compatible Linux System,” is correctly interpreting Amazon’s sometimes convoluted menus for certifying and connecting the device. The presentation takes you step-by-step through the process of using a Raspberry Pi to register with Amazon Voice Services (AVS) and set up new skills in the Alexa Skills Kit.
Anderson investigated Alexa when a customer asked about voice-enabling plumbing fixtures. Although it may be a while “before we can say ‘Alexa — clean my toilet,’” he said, there are plenty of tasks that could benefit from a hands-free interface, such as home automation. “Of course, there used to be this thing called a light switch that seemed to work pretty well,” he added. “Alexa is the intelligent agent you didn’t know you wanted or needed. Some of my neighbors come over to my house just to ask it stupid questions.”
You can do similar things using AI voice agents such as Apple’s Siri, Microsoft Cortana, and Google Now, but usually only on your mobile device. In any case, Alexa seems a bit more mature, and it’s certainly more accessible to Linux developers. By opening Alexa up to third-party developers and allowing free educational and hobbyist use, Amazon has helped solidify Alexa’s lead in the market.”
Several thousand Alexa Skills – new voice commands and response connections – are already available, including many publicly available Skills like calling an Uber. When you say the right wake word, the Echo device sends the audio to the cloud, which processes the audio, figures out what skill it’s related to, and sends back the appropriate response.
After buying an Echo Dot and playing around with it, Anderson decided to set up his own Echo-like Alexa environment on his Raspberry Pi 3 based Pi-top laptop. He started by studying an iFixit teardown of the Echo Dot, which revealed that the Dot runs on a low-end, Cortex-A8-based TI DaVinci DM3725, similar to the TI SoC on the BeagleBone Black.
“So the Echo Dot is basically a BeagleBone with lots of audio processing cleanup technology,” said Anderson. Because the bulk of Alexa’s processing happens in Amazon’s cloud-based AVS, the voice agent can run on a numerous, low-end Linux home automation hubs and hacker boards. In fact, the Raspberry Pi 3’s quad-core SoC and 1GB of RAM are overkill.
The first challenge in developing an Alexa device is provisioning it to connect to the Internet and to AVS. The Pi-top makes this easier by offering a keyboard and trackpad, but many device targets lack those niceties. Aside from voice, the only other input on an Echo Dot is a button that wakes Alexa.
To use Alexa, the target must have a mic, a speaker, and WiFi. The Raspberry Pi 3 provides the WiFi and the audio circuitry, but the audio is unamplified. To address this, Anderson opted for a cheap Bluetooth speaker/mic add-on but then wished he had gone for a wired device. “There’s a bit of latency in waking up the Bluetooth mic, so you wait before you can give the command,” he said.
Setting up a device on AVS
The next step was to register as a developer at Amazon, which Anderson did from Raspbian. You could use something like Ubuntu MATE, but with Raspbian, there’s a handy Alexa sample app, and “you’ll find more online help.”
Anderson chose Node.js with JavaScript to develop Alexa Skills. The platform also supports Java 8. In either case, the code runs in the cloud, and it runs as a remote procedure call instead of a VM, which makes things easier, said Anderson.
The first job was to download the Alexa Sample Application from the GitHub repo, which “brings down all the sources for audio, wake word, and client,” explained Anderson. The device then connects to Amazon AWS, so you can “pull down credentials.”
Within the cloud-based AVS service, you can click on Alexa and register the product type. “You give your device an ID and go to a security profile screen where they ask for a name and a description,” said Anderson. “I just entered ‘Pi Laptop.’”
Amazon then generates a series of credentials that you copy into your build environment. Along the way, Amazon asks if you want access to Amazon Music Services. If so, you need to fill out another form.
Every time you ask Alexa a question, AVS pings a separate security server for the credentials, which requires the use of redirect URLs. “In the security profile setup process, there’s an option that asks if you want to edit a redirect URL,” said Anderson. This stumped Anderson for a while until he realized it was “asking for the port number of the connection to Amazon.” He finally found the origin URL (https://localhost:3000) and the return URL (https://localhost:3000/authresponse) that does the authentication.
Once the software is configured and installed with secure credentials, you must load the services needed to get to AVS: the web service, sample app, and wake-word engine. Anderson explained how to go to alexa-avs-sample-app/samples directory and start the web service, and how to register and provision the device.
After more menus and prompts, you move to the next phase of initializing a wake word engine. Anderson used KITT.AI, which he said worked better for him than the alternative. After that came the process of setting up the WiFi and IP interfaces. Anderson did this with TightVNC Server along with Avahi daemons. He also explained how to set up new Skills in AWS, and addressed issues like when to choose between a public or private designation.
All these steps involved “a little bit of hacking, but they’re not that hard,” he said. “Once you’ve done all the admin stuff, the code is not complicated, especially with Node.js. The main problem I had was with Amazon’s frequent time-outs.”
Anderson noted that his Alexa enabled Pi-top sometimes responds to a word other than the wake-word. Choosing a good wake-word helps, but it really comes down to the microphone. “The Echo Dot has six different mics and provides a lot of extra audio processing,” he said. “What I really need is a steerable phase array of microphones with beam forming for echo cancellation. Maybe I’ll tell you about it at next year’s talk.”
Anderson plans to move forward on Alexa-enabling a robot, but he seems ambivalent about whether he will make much use of Alexa at home. The lack of security is the biggest issue, said Anderson, who at last year’s ELC conference gave a new presentation on embedded Linux security.
“The downside of these systems is that they’re always listening, waiting for that wake word, and they’re always connected to the cloud,” said Anderson. “It’s kind of cool, but being a security guy, it makes me kind of nervous. I leave it unplugged most of the time.”
Watch the complete presentation below:
Connect with the Linux community at Open Source Summit North America on September 11-13. Linux.com readers can register now with the discount code, LINUXRD5, for 5% off the all-access attendee registration price. Register now to save over $300!