3PO-LABS: ALEXA, ECHO AND VOICE INTERFACE
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact

3PO-Labs: Alexa, Echo and Voice Interface

Everybody Wants a Piece of the Pie

10/26/2019

3 Comments

 
The last couple of weeks have been a bit strange in the Alexa development community. My previous post talked about the Alexa Accelerator, and in a way that could be seen as a positive indicator of the state of the platform/space. There's a flipside to a having a vibrant platform, though, which is that it is inevitably going to attract people who see its popularity and its position in the zeitgeist as a vector for shortcutting to some end goal. And indeed, we've been seeing our fair share of that lately as well, and we'll show a few examples of this sort of behavior...


So, it seems like these projects have been coming out of the woodworks lately, and they seem to fall into two categories: people trying to use Alexa's popularity to make a name for themselves, and people trying to use Alexa's popularity to make a quick buck. Not that we have anything against people trying to make a living in the Alexa-space; Indeed, we need people fully engaged to drive the state of the art forward! But in each of these cases, it feels like the result is a net negative for the development community. On that note, lets start by following the money...*1

Pay-to-"Play"

So, a couple weeks ago a company came out of the weeds offering a service meant to help with "beta testing". Taken at face value, that's not a big deal - testing is important, and there are already companies (like PulseLabs) on the market trying to solve the customer feedback problem. As it turned out, what that this company - alexabetatesters.com - was actually offering was a quid-pro-quo to users for writing reviews of the skills. Hypothetically they have a points system that translates to different rewards tiers, but to start they were just offering money in exchange for reviewing their skill "My Pet Cat".
Picture
Their site's footer disclaims: "Amazon does not directly endorse this website as of yet.". Yeah, about that, I'm sure the endorsement is on it's way; it must've gotten lost in the mail.
Now, this is blatantly against Amazon's policies for reviewers, but you're probably thinking to yourself "how are these rewards even funded?" It turns out their strategy is to charge the skill developers who are benefiting from the reviews a per-review fee of $5. This is definitely a flagrant violation of the Alexa terms of service. Regular readers of the blog will know that the skill store is one of my pet topics. I can sympathize with how difficult it is to get traction organically, but when it comes to people cheating at the review game, I draw the line much earlier than what they're doing here.

And, as if this wasn't enough, the site itself is extremely misleading, in that it is chock-full of assets lifted right from Amazon's site. It seems to be an attempt at implying legitimacy by using Amazon's own branding, while simultaneously implying they have agreements with major players, without actually saying that explicitly. The page shows the promo banners for a bunch of heavy hitters on the skill store like NPR and EA, which were presumably the active promos at the time the site was spun up back in May of this year (this notion is furthered by the fact that they also stole the image for the first-party "Mothers Day" experience to include in their banner rotation). They also pilfered the "bunch of Alexa swag" image that was formerly used by Alexa's developer marketing for their "build a skill, get rewarded" campaign, featuring a t-shirt that their site definitely cannot offer (as it was exclusive to developers participating in that program).

​Thus far, the scheme seems to have been ineffective. Aside from their own skills, they've only managed to catch two marks. The skill they've been promoting has received two reviews since the time they started pushing it, though - hopefully Amazon will retroactively remove those.

If these folks are trying to sell what they shouldn't, the next group is trying to sell what they can't...

Remember the Yellow Pages?

So, the next scam to pop up comes from a company called voicecommand.net. Their "Voice Search Synchronization Platform" claims to be a sort of cross-platform SEO for voice assistants. All for the advertised low, low price of $799 a year! Those are definitely all real words, but they don't really mean anything in conjunction. Thanks to some great legwork done by Mark Tucker, though, we have an idea of what this theoretically looks like on the Alexa side. In theory, what you get is to have your business's name integrated and invocable at the top level of the Alexa voice model.
Picture
I've never actually seen the activity of voice assistants referred to as "voice assistance", but I guess it's grammatical?
800 dollars is a lot of money, but in theory, for a lot of companies this could actually be an extremely worthwhile investment. Being able to pick a key phrase and have that automatically grab information about your business from a database or a website would be super useful to some lines of businesses. The primary problem with their business model is that this is not a real feature that exists. Amazon does not maintain a directory of real businesses that powers its top level intent model, and even if it did, they definitely wouldn't license randos to sell direct access to that directory. They especially wouldn't sell it to a site that is so clearly copping their trademarked iconography to use as their own logo.*2

Alright, so if sketchy platform plays are one side of the coin, the flip side is the people jumping in to contribute security "research".


Injection Rejection

Take for example this SQL injection flow by Tal Melamed:
The researcher here is technically correct, he did indeed use injection to access data that without providing appropriate credentials. What the video neglects to mention, however, is that this vulnerability has precisely nothing to do with Alexa. In this case Alexa is just acting as the interface for collecting a password that he asked for, which sits in front of an improperly sanitized DAO layer into an unrelated MySQL database. The moral of the story here is not "Alexa is insecure", but rather "always sanitize your inputs, regardless of platform".
Picture
I'm of course going to take any opportunity to link this classic xkcd piece (credit Randall Munroe, of course)
What's more, the developer of this actually had to go out of his way and perform some intent model gymnastics to even make an environment in which he could produce this video. Whereas a normal "password" intent would consist of a series of slots limited to single character or number inputs, this voice model had to be stretched to actually include all of the extra words that would be needed to build out an injectable input. Really, though, there's no such thing as a "normal password intent" to begin with, though, because asking users to speak a password to a room - regardless of how secure the system backing the skill might be - is a terrible UX, and one that virtually nobody is doing today.

While Malamed's injection technique is a hypothetical vector for a user of a skill to attack a skill developer, another group took the opposite approach and decided to see if they could come up with any new ways for a skill developer to attack their users. They were unsuccessful in that regard, but that's not the way it ended up being reported...

"Security Research" from Security Research Labs

So, making the rounds right now is a writeup by the aptly named "Security Research Labs" about how a skill might eavesdrop on a user or phish their private information. Lets take a quick look at the videos they produced for the Alexa piece of it:
Now, lets be clear - these videos aren't bad, and they actually explain some really important concepts that consumers of the Alexa ecosystem should be aware of! The issue is that when they presented their research it wasn't in the vein of "we produced some videos to teach your grandparents how not to get tricked", it was "hey look, we discovered some new vulnerabilities on Alexa". And that is just entirely untrue - not a single bit of this research is novel to even a moderately experienced Alexa dev.

Further, as a purported research agency, it should not have been news to SRLabs that Checkmarx described essentially the exact same flows 18 months before they did, and got a big round of media coverage*3 at the time (see this Forbes article, for example). But here's the thing - even when Checkmarx revealed their steps in April of last year, they didn't present anything that was new to experienced developers.

Let's look at what they hope to accomplish, in their words:
Through the standard development interfaces, SRLabs researchers were able to compromise the data privacy of users in two ways:
1. Request and collect personal data including user passwords
2. Eavesdrop on users after they believe the smart speaker has stopped listening


If they had actually found new ways of causing either of these things to happen, that would be a huge deal, but in reality their approach uses some really common techniques:
  1. They use the ol' bait-n-switch to change the skill's behavior after passing cert. Alexa skills have their intent models and their access to new interfaces (think, push notifications) locked down between cert passes, but the content of the conversation is dynamic, just like on the web, mobile, or any other real development platform. Virtually every developer is making changes to each of their skills without going through certification again. In fact, I'll freely admit that in a couple of my skills I actually have a flag that is controlled by a config in my properties file and says "if I'm in cert, act one way, else act another", just so I don't have to have the dreaded "Rule 4.1" fight every single time.
  2. They make their skills say they are stopping, but then don't actually stop the session. What they have failed to show, though (and what Forbes called out when Checkmarx showed this exact thing), is a way to make an Echo device turn off its blue light, which is the canonical indicator for listening vs not-listening. Now, to their credit, they are correct in asserting that the requirement of using a visual cue to confirm an audio-product's state is not ideal (and it's especially troubling when considered from the perspective of the visually impaired audience, for whom voice-first is otherwise presumably a boon). But the question of "what state is my voice assistant currently in?" is a really hard one that to my eyes has not been solved (and may not every be satisfactorily solvable) by anyone - a lot of the best practices from UI development do not translate well here, as the human brain can't process multiple audio stream simultaneously and has no equivalent of peripheral vision.
  3. They sit silently for a bit, while the skill is still running. They claim to have figured out how to do it using special characters that Alexa won't pronounce, which is more or less the equivalent of a TV hacker pulling up a terminal window over their GUI - it's something laypeople might associate with "how 2 hax". The special characters are completely irrelevant given that there are multiple ways to sit silently on Alexa, including using the SSML break tag which is literally built into the spec. And to be clear, this is not a case of the authors being unaware - they actually mention SSML breaks in their closing. They instead just chose to take the route that looks more complicated and appears to the uninitiated to exploit a mistake on Amazon's part. The ability to sit idly in a skill exists for a reason, and developers use them for perfectly valid use cases. How many stars would you give to a mindfulness skill that never stopped talking while you were trying to meditate?
  4. They built a facsimile of the old Amazon Literal. Checkmarx did the same thing, but I don't think these researchers were at all aware of the history around this feature, because the terminology they used to describe it was all wrong and missed the appropriate nuance. The idea here is to get an open-ended intent to transcribe the unsuspecting speaker's ambient speech. The thing is, Alexa is notoriously bad at transcription, and the way they're choosing to approach the problem is extra convoluted. At best they get a small snippet of poorly transcribed text with this approach, but it's almost certain that a user - even one who isn't tech savvy, would catch this before falling into the trap. This is closely related to point number 2, but even the less tech savvy quickly become aware of the "listening vs not listening" light ring states, through the joy of accidentally waking the device with a homophone of the wake word. The chances that you'll snoop something meaningful before the first user realizes what is going on and reports the issue are minuscule.
  5. Alexa can ask you for your password, just like it can ask you for anything else, or just like any other platform can ask you for your password. As noted above, doing passwords on Alexa is an especially bad experience, so nobody does it, and therefore this is going to look extra suspicious to even a non-savvy user. Plus, even if this was an Alexa thing and not an all-user-interfaces-ever thing, to capture the password requires a custom slot that is going to clearly draw the attention of certification. It's self-defeating.
To be clear, it's not that I just want to sit here and dunk on these researchers for not first doing their own research on prior art or for inflating the significance of their findings. My concern with all of this is that there are actual repercussions to crying wolf, due to these technologies being new enough that the average user doesn't have the tools to disambiguate real threats from hot air.

Amazon Responds

All of the bad press around the SRLabs research seems to have forced Amazon's hand, and as of last week they made two changes. First, any response to an AMAZON.StopIntent will have its shouldEndSession flag ignored (and treated as if true). And second, they have updated their cert guidelines to say that skills should not hold a session open after claiming to close.

The second of these changes is great, and the first one is terrible, and I think this is the real story here. It's bad for two reasons. The first is that it solves nothing. There are still a bunch of ways to keep a session open after it seems like it has closed, so this is at best a cosmetic change. The second is that it is backwards incompatible, and breaks some very legitimate use cases - like eliciting confirmation in the case of a complex intent model with a lot of false positives, or offering up a survey about user experience upon exit - where developers were not immediately ending the session.

What's worse, Amazon made these changes without notifying the developers whose code they were breaking. There was no system-wide developer email describing this new tweak to existing workflows, it was just dropped on top of active skills.

Consider the following entirely made up interaction:
User: "Alexa, open Mei's amazing mazes"
Alexa: "Welcome back, you're currently in the Minotaur's Mansion, what will you do?"
User: "Step forward"
Alexa: "You step forward into a hallway, what now?"
User: "Step"
Alexa: ​"It sounded like you said `stop`, but I'm not sure. Do you want to quit?"
User: "No"
<awkward silence>

Because the system is now forcing an exit on the StopIntent, the user is having a bad play experience that they would not have had before this change. The developer, of course, will have absolutely no idea that this is happening to their skill unless they have some exceedingly detailed synthetic monitoring running continuously against their live skills, which I'd venture nobody is currently doing. Some may discover it through some complex path analysis, but by the time they notice it, they've likely lost a bunch of repeat users and garnered bad reviews on the store that will have a chilling effect on future prospects.

Frankly, it is unacceptable for Amazon to make backwards incompatible changes without notifying us, and it especially shouldn't be happening as a kneejerk reaction to false alarms. If they really wanted to make this ill-advised change to the StopIntent, there's a cleaner solution they could've taken: live skills are grandfathered in until the next time they go through cert, at which point they need to change their Stop flows to pass. Hopefully they'll think better of it and roll the requirement back.

One Bright Side: Community Comes Through!

There has been one silver lining to this flurry of frustrating stories, however, and that's the fact that the Alexa dev community has been unified on the "don't come out here actin' a fool" front in a way that I haven't really seen since the very early days of the platform. I want to give some special love to my fellow Alexa Champions, who were the sources for most of this information, and who were particularly quick to take action with to mitigate the issues. I specifically want to give attribution to a few folks:
  • Liam Sorta (Twitter: @LiamSorta) and Bob Stolzberg (Twitter: @BobStolzberg) both independently noticed and raised alarm about the AlexaBetaTesters site.
  • Bob was also the first one I saw who was pointing out the voicecommand.net scam. Mark Tucker (Twitter: @MarkTucker) did a ton of good work investigating what was up with that whole situation.
  • Mark was my link to the "SQL Injection" research too, and he was the first one I saw to refute the findings on his Twitter feed.
  • Finally, it was Tom Hewitson's (Twitter: @TomHewitson) eagle eyes that caught the Amazon Response to the SRLabs story, both in terms of the backwards incompatibility and the change to the guidelines.

Amazon has to do its part in managing its ecosystem, but they can't be everywhere at once, and they have to constantly balance their actions with the perceptions generated by their approach to building a platform. That means that a lot of these problems need to be self-regulated within the dev community. Sure, we don't have the legal right nor resources to shut down someone running an Alexa scam, but collectively we can use our soapboxes to call out folks who are cheating the review system or to act as the voice of reason when people are spreading misinformation about the risks to Alexa's users. Some other development ecosystems (I'm looking at you, mobile games) quickly became a cutthroat race to the bottom, and so I'm glad to see that Alexa Skill development, or at least the circles I run in, has resisted some of those base temptations thus far.
Think I'm wrong to call out any of these groups? Have evidence of other sketchy behavior that we haven't noticed yet? Let me know! This is theoretically a super contentious topic and I'm super interested to hear other opinions.
¹: I'm gonna risk the Streisand Effect here and link out to each piece of content - I suspect readers of this blog are savvy enough to not be drawn in by what they are asserting. 
²: Fun fact - Amazon actually made us change our images a couple years ago. We used a cartoonish Echo-v1-esque base on which we added more cartoonish elements for our "Bot Family" of skills, but they made us change all of our skill icons, site imagery, etc, and that was way less egregious than what these guys are doing. In the case of SRLabs's intro image for this research, their "devil horns on an echo" is literally what our old InsultiBot logo was.
³:  It's worth noting that tech media is probably as culpable as anyone else in this. They seem super thirsty for any story that implies that Amazon or one of their competitors in the space slipped up on the privacy/security front. It's understandable, as those stories drive a lot of clicks, owing to the widespread paranoia of the systems (which itself is a combination of people not understanding the newfangled platforms, legitimate concern over where digital privacy is headed, and actual screw-ups these companies have made since their assistants' inceptions). But it's frustrating, given that at no point in any of the stories I read on the SRLabs nor the Checkmarx research - be that at TheVerge, ArsTechnica, CNET, or any other outlet - did the researchers bother to go ask an experienced Alexa dev whether the techniques were novel, constituted a tangible threat to users, or could be mitigated in any reasonable way that wouldn't cripple the platform.
3 Comments
Jo Jaquinta
10/31/2019 11:58:08 am

Amazon have always been someone vague about STOP versus CANCEL and how they are supposed to behave. As a consumer, I do like that you can, generally, say "Alexa stop" in the middle of anything and it will cut out. As a developer, I just wish they would be consistent about it.

Reply
Albert Bereti link
10/31/2019 04:36:53 pm

The site is made for Alexa Skill Developers to recieve feedback on their skill. Testers say what they liked/disliked/issues they had and provide feedback to the skill developers.

This feedback is given off of Amazons site and is not against TOS. There is a disclaimer in the footer on every page. Your article is spreading misinformation and it is completely illegal to do so.

This is known as defamation, this is your first warning to remove this publication before legal action is taken.

Reply
Eric Olson
11/1/2019 08:13:22 am

Hi Albert. I'm curious which part you believe is defamatory. Reminder that simply reporting on a thing that is true is not defamation. Many different people reported that you were paying $3 per review of My Pet Cat. Immediately thereafter, My Pet Cat has garnered 5 reviews, despite having zero reviews prior.

I guess maybe you are speaking of the Amazon imagery that you use, and how I called that out as lifted? If you indeed have written permission from Amazon and from the IP owners to use their images on your site, pass that along to me and I will gladly both remove that section and add a clear retraction/apology for that part of it. Your disclaimer implied you didn't have that permission, but I suppose it's possible that wasn't a blanket statement saying you hadn't garnered any permissions at all.

All that being said, I note that in this context you seem to be carefully avoiding the term "review" and have switched to "feedback" instead. I think you understand as well as I (and everyone else on the various Alexa social media who reported this well before I wrote on it) do that a skill review refers to a very specific thing in the context of the Alexa ecosystem. The words you choose to use are important.

If all you want is really just to provide a mechanism for developers to get direct, private feedback outside of the Amazon review system for their skills, I think that's actually a great idea (in fact, you'll note that as a tester by trade I've been a huge fan of the work Pulse Labs does along these lines in the past). If you want to switch the terminology on your site to be about user feedback instead of reviews, and make it clear to users that reviewing the skill cannot be a part of the paid quid pro quo, I'd also gladly post an update to the post for that. The whole point of this post was about the dev community trying to solve its own problems, and that would be a perfect example.

Reply



Leave a Reply.

    Author

    We're 3PO-Labs.  We build things for fun and profit.  Right now we're super bullish on the rise of voice interfaces, and we hope to get you onboard.



    Archives

    May 2020
    March 2020
    November 2019
    October 2019
    May 2019
    October 2018
    August 2018
    February 2018
    November 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    February 2017
    January 2017
    December 2016
    October 2016
    September 2016
    August 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015

    RSS Feed

    Categories

    All
    ACCELERATOR
    ALEXA COMPANION APPS
    BOTS
    BUSINESS
    CERTIFICATION
    CHEATERS
    DEEPDIVE
    EASTER EGG
    ECHO
    FEATURE REQUESTS
    MONETIZATION
    RECAP
    RESPONDER
    TESTING
    TOOLS
    VUXcellence
    WALKTHROUGH

Proudly powered by Weebly
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact