3PO-LABS: ALEXA, ECHO AND VOICE INTERFACE
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact

3PO-Labs: Alexa, Echo and Voice Interface

On XBox integration, invocations, and skill clobbering

10/13/2018

7 Comments

 
So, last week Amazon and Microsoft announced a major new feature - the ability to control your XBox from Alexa. While the two companies had certainly been moving closer of late (see: Cortana x Alexa cross-functionality), the announcement was a big surprise, and a welcome one at that. Unfortunately, along with that feature came a host of new issues for a lot of folks using the platform. Understanding the regressions and their implications touches quite a few interesting areas, and I do my best here to distill each of them.


So, lets get a quick disclaimer out there: This isn't gonna be a short article, and it's not something that will easily fit into a TL;DR. A Table of Contents is the best I can offer, as there's a lot to cover in terms of what the issue is, what the history of the problem is, the tradeoffs, and the latent resentment this update is stirring up in the dev community.
  • The Feature
  • The Problem
  • Technical Details
  • Top-Level Invocations vs Skill Invocations
  • Mitigation Missed
  • Testing

The Feature

Getting everyone up to speed on the change, here's what happened. As part of Microsoft's October XBox rev, they introduced a new Alexa skill. You can check the skill out here, but the short version is that it lets you link your accounts and then control a bunch of features of the XBox dashboard via an Alexa-enabled device. It's a pretty solid offering, and it goes a long way towards bringing back feature parity to the features lost when Microsoft abandoned its Kinect gambit. (Fun aside - a year before I dove feet first into Alexa, I actually reached out to a bunch of Microsoft people talking about the VUI stuff I wanted to do with Kinect if they'd just let me build for the platform. None of them ever responded, then the Echo happened, and the rest is history).

The feature itself actually has two different interfaces - there's the standard ability to launch with an invocation name 'Alexa, tell Xbox to record that', but it also has a smart home interface. This in and of itself is interesting, as developers have been looking for a while for a way to be able to tie our brands skills together via a single skill entry when there are multiple types of interfaces (think: a news agency's custom skill that also has a flash briefing). It seems Microsoft got special treatment or access to an unannounced new feature in that regard.

If you check out the sample utterances they provide, it looks like this: 
XBox Skill(s) Sample Utterances

That's a pretty nifty feature set - the record and pause features come to mind as being particularly handy. But it's one line there that is the root of this whole problem, and it's the last one: "Alexa, launch Forza Horizon 4".

The Problem

So, I'm several hundred words into this post, and haven't even said yet what the trouble is. Lets look at a screenshot to understand the issue: 
Picture
"open one bus away" is a standard launch invocation for the Alexa skill One Bus Away. OneBusAway was one of the early adopters of the platform, and it's worked very consistently via this pattern for the last two years. As you can see, however, despite the ASR step matching what I said perfectly, the response from Alexa is to trigger Battlefield 1. At first I thought this was just a specific case of One Bus Away being clobbered, but as I started asking around, apparently other folks are having the exact same problem. Just within the ranks of the Alexa Champions, at least three of my fellows' skills - Steve Arkonovich's Big Sky, Nick Schwab's Fireplace Sounds, and Matt Kruse's intriguing-but-not-yet-released "Speed Tap" -  are having almost identical issues.

Technical Details

More specifically, it seems that this problem occurs when a couple things are true:
  1. The device triggering the interaction is a display device (in my case, a first gen Echo Show).
  2. There is some sort of media service (often FireTV) tied to the same account as the request.

With a bit of digging, it turns out that this is not technically a new problem, rather it's something that was around previously, which is now being exacerbated by the XBox addition. And that's where we come back to that final line in the XBox utterances, the launch phrase. By adding the ability to say "Alexa, launch <name of XBox game>", rather than "Alexa, ask XBox to launch <name of XBox game>", Amazon has created an ambiguous situation.

Early on, "launch" and "open" were exclusively reserved for launching skills, but at some point that stopped being true. Previously, the most common case where this would occur was something like "Alexa, play <name of an Alexa game>", where the "play" trigger was ambiguous between requests for music (or videos on the Show) and skills. Generally speaking, media content always wins in that case, and skill developers learned to coach our users to not use the word "play". Ever. Now, the Alexa media catalog has been greatly expanded with all of Microsoft's offerings, and the words "Launch" and "Open" have been commandeered. That's not good for developers/users of skills.

Top-Level Invocations vs Skill Invocations

Now, to be clear, nobody believes that it was Amazon's intent to let "Battlefield 1" take over the "Open One Bus Away" invocation just because it has the word "One" in it (nor to have "Need for Speed" replace "Speed Tap", etc). These invocations are obviously being way more aggressive than they should be, and Amazon will almost certainly correct this soon, but this case brings up a perfect opportunity to talk about an exciting new future problem - overloading Alexa's top level.

Among the development community, we have a term called "Top Level Invocations" (or sometimes "name-free invocations" or "super editorials"), which refer to the utterances that let you invoke a skill without saying its name. Rather than, "Alexa, ask CompliBot to compliment me", you can just say "Alexa, compliment me", and sometimes have CompliBot serve up a session, for example.

And really, these TLI represent the ideal of Alexa: The idea that you can just ask for something, without namespacing it, and get what you want. To wit, at no point in Star Trek did you ever hear Picard say "Computer, tell Unofficial Starfleet Skill to engage self-destruct".

Plus, virtually every developer has had users tell us, straight-up, that they don't use our skills because it's too hard to remember names. But solving that is a non-trivial problem. It's not like we can just take everyone's interaction model and cram them together at the top-level - there are some intents implemented by every single skill, so it would become a disambiguation nightmare.

Amazon is making progress, however. Expanding out their normal first-party features with TLI managed by their marketing team (like the Compliment one above) was one of the first steps. FlashBriefings were another way to serve a specific piece of this puzzle - making it so users didn't need to query multiple news or podcast skills sequentially anymore. More recently, they've opened up the CanFulfillIntentRequest as a way to add more things to the top level.

But every time they add these features, they are creating additional work for developers, whether that's fighting for marketing slots, building FlashBriefing versions of their skills, or implementing CanFulfill. Maybe more importantly, every time they add a new request or type of request to the top level, they're increasing the complexity of the overall system and the odds of unexpected collisions. It's definitely a rock-and-a-hard-place type situation.

Additionally, each of these solutions creates several of its own independent problems, often involving a chicken-and-egg where a new skill builder can never garner the attention necessary to take advantage of these features by virtue of the fact that they don't have the attention to begin with. But that's a topic for a whole series of other posts...


Mitigation Missed

Now, it's important to point out that the way this failure occurred has several levels. The super obvious one is the explicit failure. Alexa's ASR understood me perfectly when I said "Open one bus away", and so choosing something that was an extremely marginal fit instead of an exact match was obviously the wrong choice. That'll get fixed, probably pretty quickly.

But luckily the science is such that we have other ways of intuiting what the user might've wanted. A big push among the Alexa team lately has been to better understand and utilize context. "What was the user doing when they made their utterance?", or "What else do we know about the user that might help us make a better choice?". This is likely where the "is it a display device" and "do they have a FireTV" aspects come into play - the failure seems to be that these signals were being misinterpreted and then weighed far too heavily. Unfortunately, they also either completely missed or didn't properly weigh what I believe to be the most important signal of all: the historical success of the mapping to the existing skill.

As I mentioned, One Bus Away is one of the more tenured skills on the store. I've been using it five days a week for two years. All of that historical data should add up to tell Amazon that the mapping of the invocation "open one bus away" to the concept of launching the skill named "One Bus Away" was working really well. If it had been acting incorrectly, it is reasonable to assume that I would've stopped using that utterance, rather than continuing to invoke it with a regular cadence. Now, this isn't always going to be the case. Matt Kruse's brand new game will have no history of usage, but that doesn't mean Need for Speed" is a better match for "open speed tap" than his game literally named "Speed Tap". But in the case of skills like One Bus Away or Big Sky that lend themselves to consistent usage, it's inexcusable to have any new Alexa feature interfere and siphon away committed users who are calling a skill by its name.

Finally, I hold what is a slightly more controversial opinion on the matter, which is that Amazon is creeping closer but has not yet reached the point of being fully committed to Alexa Skills as the primary feature of their platform - they're still hedging. The crux of the argument is something like this: "If you are running a platform, and you can choose to promote quality content that is native to your platform, or quality content that is imported from outside of your platform, which do you choose?"

At early stages, the answer is almost always the latter - you want people to care about your platform so you entice them with things that they are familiar with. Many people associate Alexa's first real "win" with the 2016 Superbowl marketing push where they dropped the Dominos and Uber skills, both of which existed on numerous platforms outside of Alexa, and neither of which did a particularly good job showing what was so great about Alexa. At some point, though, you have to pivot and start focusing on what you can provide that others cannot (see: Netflix). Otherwise the most you can hope to achieve is parity, not primacy.

In the case of Alexa, that is skills. Until the day that voice-first companies and Alexa-native skills are the headline, and all of the other features are the subtext, we (skill devs) haven't "made it" yet. It's up to Amazon to determine when they're ready to shift the conversation in that way, because they're the only ones who can do it.
Picture
Angry Birds: Synonymous with iPhone gaming
To think of it another way, and use an analogy that Amazonians love (the Angry Birds paradigm shift on mobile) - can you imagine the scenario where, if you pulled out your iPhone today and said "open Angry Birds", instead of opening the hit mobile app, it chose to play the subpar animated movie with the same name? It just wouldn't happen.

There's one last facet of this whole thing, and it's my standard rant: testing...

Testing (yes, again...)

You didn't think you were gonna get out of a massive blog post without a tirade from me about testing, did you? I'm not even gonna drop a lot of words here, just gonna drop these previous posts: 
  1. A Treatise on Testability
  2. Our poor abandoned testing tool
  3. A Treatise on Testability, Redux

And a few bullet points:
  • This sort of thing has happened before (Jan. 2018 comes to mind), and it'll happen again.
  • Testing through the CLI would not have caught any of this
  • The fact that we're getting intermittent reports of issues from devs implies most people don't know their skills are broken

IT'S TIME, Amazon. Full testing. Audio clips, through the front door, including context (like device type).
So, that was a lot of text. And not a lot of not-text. It turns out rants about utterances on a voice user interface don't lend themselves very well to images or other media. If you've made it this far, I'd love to hear your take on any or all of the above. Drop a comment, or hit up my Twitter or email.
7 Comments
angry birds transformers mod link
10/31/2018 07:06:03 am

An impressive share, I just given this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you become expertise, would you mind updating your blog with more details? It is highly helpful for me. Big thumb up for this blog post!

Reply
angry birds transformers mod link
10/31/2018 07:06:46 am

An impressive share, I just given this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you become expertise, would you mind updating your blog with more details? It is highly helpful for me. Big thumb up for this blog post!

Reply
Download super smash flash 2 link
12/13/2018 03:30:49 am

An interesting discussion is worth comment. I think that you should write more on this topic, it might not be a taboo subject but generally people are not enough to speak on such topics. To the next. Cheers

Reply
prowritingpartner.com link
3/20/2019 04:39:18 am

From resolving the fear of stepping away from their job to how to provide for aging parents and adult children, to caring for health issues.

Reply
things to do link
5/15/2020 02:39:17 am

I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it

Reply
Roofing contractors dallas tx link
11/3/2020 10:50:23 pm

These are really awesome information, and this post is a great resource! Thanks

Reply
Christopher Hodge link
11/4/2020 06:04:28 am

Our complex Adult shop offers the greatest sex toys in India. Purchase online securely and safely. We give a shocking cluster of great Adult toys like anal sex toys, strap on dildos, Pocket pussy, penis cream, penis sleeves, sex power tablets for men and so on. We at Adult Junky, offer best adult toys in India and too at a moderate costs. you will encounter an outlandish joy utilizing our modest sex toys in India. Look at the sex toys accessible at our site Adult Junky.

Reply



Leave a Reply.

    Author

    We're 3PO-Labs.  We build things for fun and profit.  Right now we're super bullish on the rise of voice interfaces, and we hope to get you onboard.



    Archives

    May 2020
    March 2020
    November 2019
    October 2019
    May 2019
    October 2018
    August 2018
    February 2018
    November 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    February 2017
    January 2017
    December 2016
    October 2016
    September 2016
    August 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015

    RSS Feed

    Categories

    All
    ACCELERATOR
    ALEXA COMPANION APPS
    BOTS
    BUSINESS
    CERTIFICATION
    CHEATERS
    DEEPDIVE
    EASTER EGG
    ECHO
    FEATURE REQUESTS
    MONETIZATION
    RECAP
    RESPONDER
    TESTING
    TOOLS
    VUXcellence
    WALKTHROUGH

Proudly powered by Weebly
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact