Pay-to-"Play"
And, as if this wasn't enough, the site itself is extremely misleading, in that it is chock-full of assets lifted right from Amazon's site. It seems to be an attempt at implying legitimacy by using Amazon's own branding, while simultaneously implying they have agreements with major players, without actually saying that explicitly. The page shows the promo banners for a bunch of heavy hitters on the skill store like NPR and EA, which were presumably the active promos at the time the site was spun up back in May of this year (this notion is furthered by the fact that they also stole the image for the first-party "Mothers Day" experience to include in their banner rotation). They also pilfered the "bunch of Alexa swag" image that was formerly used by Alexa's developer marketing for their "build a skill, get rewarded" campaign, featuring a t-shirt that their site definitely cannot offer (as it was exclusive to developers participating in that program).
Thus far, the scheme seems to have been ineffective. Aside from their own skills, they've only managed to catch two marks. The skill they've been promoting has received two reviews since the time they started pushing it, though - hopefully Amazon will retroactively remove those.
If these folks are trying to sell what they shouldn't, the next group is trying to sell what they can't...
Remember the Yellow Pages?
Alright, so if sketchy platform plays are one side of the coin, the flip side is the people jumping in to contribute security "research".
Injection Rejection
While Malamed's injection technique is a hypothetical vector for a user of a skill to attack a skill developer, another group took the opposite approach and decided to see if they could come up with any new ways for a skill developer to attack their users. They were unsuccessful in that regard, but that's not the way it ended up being reported...
"Security Research" from Security Research Labs
Further, as a purported research agency, it should not have been news to SRLabs that Checkmarx described essentially the exact same flows 18 months before they did, and got a big round of media coverage*3 at the time (see this Forbes article, for example). But here's the thing - even when Checkmarx revealed their steps in April of last year, they didn't present anything that was new to experienced developers.
Let's look at what they hope to accomplish, in their words:
Through the standard development interfaces, SRLabs researchers were able to compromise the data privacy of users in two ways:
1. Request and collect personal data including user passwords
2. Eavesdrop on users after they believe the smart speaker has stopped listening
- They use the ol' bait-n-switch to change the skill's behavior after passing cert. Alexa skills have their intent models and their access to new interfaces (think, push notifications) locked down between cert passes, but the content of the conversation is dynamic, just like on the web, mobile, or any other real development platform. Virtually every developer is making changes to each of their skills without going through certification again. In fact, I'll freely admit that in a couple of my skills I actually have a flag that is controlled by a config in my properties file and says "if I'm in cert, act one way, else act another", just so I don't have to have the dreaded "Rule 4.1" fight every single time.
- They make their skills say they are stopping, but then don't actually stop the session. What they have failed to show, though (and what Forbes called out when Checkmarx showed this exact thing), is a way to make an Echo device turn off its blue light, which is the canonical indicator for listening vs not-listening. Now, to their credit, they are correct in asserting that the requirement of using a visual cue to confirm an audio-product's state is not ideal (and it's especially troubling when considered from the perspective of the visually impaired audience, for whom voice-first is otherwise presumably a boon). But the question of "what state is my voice assistant currently in?" is a really hard one that to my eyes has not been solved (and may not every be satisfactorily solvable) by anyone - a lot of the best practices from UI development do not translate well here, as the human brain can't process multiple audio stream simultaneously and has no equivalent of peripheral vision.
- They sit silently for a bit, while the skill is still running. They claim to have figured out how to do it using special characters that Alexa won't pronounce, which is more or less the equivalent of a TV hacker pulling up a terminal window over their GUI - it's something laypeople might associate with "how 2 hax". The special characters are completely irrelevant given that there are multiple ways to sit silently on Alexa, including using the SSML break tag which is literally built into the spec. And to be clear, this is not a case of the authors being unaware - they actually mention SSML breaks in their closing. They instead just chose to take the route that looks more complicated and appears to the uninitiated to exploit a mistake on Amazon's part. The ability to sit idly in a skill exists for a reason, and developers use them for perfectly valid use cases. How many stars would you give to a mindfulness skill that never stopped talking while you were trying to meditate?
- They built a facsimile of the old Amazon Literal. Checkmarx did the same thing, but I don't think these researchers were at all aware of the history around this feature, because the terminology they used to describe it was all wrong and missed the appropriate nuance. The idea here is to get an open-ended intent to transcribe the unsuspecting speaker's ambient speech. The thing is, Alexa is notoriously bad at transcription, and the way they're choosing to approach the problem is extra convoluted. At best they get a small snippet of poorly transcribed text with this approach, but it's almost certain that a user - even one who isn't tech savvy, would catch this before falling into the trap. This is closely related to point number 2, but even the less tech savvy quickly become aware of the "listening vs not listening" light ring states, through the joy of accidentally waking the device with a homophone of the wake word. The chances that you'll snoop something meaningful before the first user realizes what is going on and reports the issue are minuscule.
- Alexa can ask you for your password, just like it can ask you for anything else, or just like any other platform can ask you for your password. As noted above, doing passwords on Alexa is an especially bad experience, so nobody does it, and therefore this is going to look extra suspicious to even a non-savvy user. Plus, even if this was an Alexa thing and not an all-user-interfaces-ever thing, to capture the password requires a custom slot that is going to clearly draw the attention of certification. It's self-defeating.
Amazon Responds
The second of these changes is great, and the first one is terrible, and I think this is the real story here. It's bad for two reasons. The first is that it solves nothing. There are still a bunch of ways to keep a session open after it seems like it has closed, so this is at best a cosmetic change. The second is that it is backwards incompatible, and breaks some very legitimate use cases - like eliciting confirmation in the case of a complex intent model with a lot of false positives, or offering up a survey about user experience upon exit - where developers were not immediately ending the session.
What's worse, Amazon made these changes without notifying the developers whose code they were breaking. There was no system-wide developer email describing this new tweak to existing workflows, it was just dropped on top of active skills.
Consider the following entirely made up interaction:
User: "Alexa, open Mei's amazing mazes"
Alexa: "Welcome back, you're currently in the Minotaur's Mansion, what will you do?"
User: "Step forward"
Alexa: "You step forward into a hallway, what now?"
User: "Step"
Alexa: "It sounded like you said `stop`, but I'm not sure. Do you want to quit?"
User: "No"
<awkward silence>
Because the system is now forcing an exit on the StopIntent, the user is having a bad play experience that they would not have had before this change. The developer, of course, will have absolutely no idea that this is happening to their skill unless they have some exceedingly detailed synthetic monitoring running continuously against their live skills, which I'd venture nobody is currently doing. Some may discover it through some complex path analysis, but by the time they notice it, they've likely lost a bunch of repeat users and garnered bad reviews on the store that will have a chilling effect on future prospects.
Frankly, it is unacceptable for Amazon to make backwards incompatible changes without notifying us, and it especially shouldn't be happening as a kneejerk reaction to false alarms. If they really wanted to make this ill-advised change to the StopIntent, there's a cleaner solution they could've taken: live skills are grandfathered in until the next time they go through cert, at which point they need to change their Stop flows to pass. Hopefully they'll think better of it and roll the requirement back.
One Bright Side: Community Comes Through!
- Liam Sorta (Twitter: @LiamSorta) and Bob Stolzberg (Twitter: @BobStolzberg) both independently noticed and raised alarm about the AlexaBetaTesters site.
- Bob was also the first one I saw who was pointing out the voicecommand.net scam. Mark Tucker (Twitter: @MarkTucker) did a ton of good work investigating what was up with that whole situation.
- Mark was my link to the "SQL Injection" research too, and he was the first one I saw to refute the findings on his Twitter feed.
- Finally, it was Tom Hewitson's (Twitter: @TomHewitson) eagle eyes that caught the Amazon Response to the SRLabs story, both in terms of the backwards incompatibility and the change to the guidelines.
Amazon has to do its part in managing its ecosystem, but they can't be everywhere at once, and they have to constantly balance their actions with the perceptions generated by their approach to building a platform. That means that a lot of these problems need to be self-regulated within the dev community. Sure, we don't have the legal right nor resources to shut down someone running an Alexa scam, but collectively we can use our soapboxes to call out folks who are cheating the review system or to act as the voice of reason when people are spreading misinformation about the risks to Alexa's users. Some other development ecosystems (I'm looking at you, mobile games) quickly became a cutthroat race to the bottom, and so I'm glad to see that Alexa Skill development, or at least the circles I run in, has resisted some of those base temptations thus far.