3PO-LABS: ALEXA, ECHO AND VOICE INTERFACE
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact

3PO-Labs: Alexa, Echo and Voice Interface

Hidden Language for Voice Assistants

6/6/2016

1 Comment

 
Here at 3PO Labs, a common topic of conversation is that of "semantic vs syntactic" language for voice assistants, specifically Alexa. The real crux of the discussion is "How do you get a voice assistant to do what you mean, rather than what you say?" This question was was one of the driving forces behind our creation of DiceBot, as we explain in more detail after the break...


tl;dr - One piece of natural language is implicit (rather than explicit) language. As an experiment, we added some special functionality to our new skill that would purposely do something other than what was explicitly called for. It worked really well and was a lot of fun, and Alexa handled it pretty well.
After spending a lot of time tweaking and refining CompliBot and InsultiBot, only to have them lost in a sea of thematically similar but less polished skills, it was important for us that our next skill was novel in some way - be that via a gimmick or by subverting the normal skill pattern. What we decided to do was to take a skill with a simple function and then extend it to have some very basic understanding of implicit commands alongside the explicit commands that are the norm.

"So you built software that doesn't do what you say?.."

So, at a conceptual level, we recognize how weird this idea is. Building software that does something other than exactly what you've commanded it to is - generally speaking - the province of poor developers.

The grand experiment, though, was really about a wider set of questions that I think are important to almost everyone working (or playing) in the field of voice assistants. How do you make your voice interface model natural language more accurately? (Which is one half of an even broader concern - "How do we get people to more naturally engage with software interfaces?" - with the other half being focused on training users to better map their interaction patterns to machine-understandable formats).

The question of how to make voice interfaces more like natural language is incredibly broad, and there are a lot of pieces to it, only some of which are in our control. For example, at present when using Alexa we are still forced to contend with the namespace problem where all skills must be launched via an invocation name ("Alexa, ask CompliBot to give me a compliment") rather than as top level utterances ("Alexa, give me a compliment"). Until Amazon gives us the ability to integrate more smoothly into Alexa, we'll always have that barrier to natural interaction to contend with.

There are some things that are in our control, however, like handling idioms, slang, or other phrases that may seem divergent from our primary language model.  For example, we are enamored with the idea of inserting Easter Eggs into our voice models. Unfortunately, each additional utterance you define has the potential to diminish the matching probability of the existing utterances, and so we actually had to pull a ton of things out of CompliBot and InsultiBot to make our voice model understand the user's input at the frequency we needed.  In that instance it killed the dream of a "secret menu" of commands, but we knew it was something we wanted to try again when we had the right opportunity.

Finally, we were also interested in how you approach the problem of subtlety in an interface where you have essentially no context. As humans, we often use visual cues (like body language), as well as other audio cues (like emphasis, word pacing, or prosody) to understand when there may be subtext that surrounds the words we're hearing. For a voice interface like Alexa, all of that great contextual information immediately goes out the window.

All of these things led us to an interesting question - sticking fairly closely to natural sounding language, how could we inject some context back in to fairly simple utterances?

After a bit of thinking, the answer became clear: coded language.

The Duck Flies at Midnight

Bond interacts with an informant while undercover
Coded language is great, except when you get caught, like James bond in "From Russia With Love"
In many classic spy movies, you can expect to see a scene where the protagonist encounters another undercover agent, and in order to verify each others' credentials they partake in a back and forth exchange of very specific pleasantries or small-talk. The idea is that they can openly undergo the exchange without worrying about who overhears them, while still maintaining their guise if the person they were talking to was not the expected contact.

This idea of codifying normal language so that you can speak openly while still communicating extra information to Alexa is what we latched on to and built in to DiceBot.

Encoding into a limited space

The reason the exchange of pleasantries encoding works in the spy movies is because it is in-line with the natural way in which people interact with each other in those settings. If we were to ask a user to ask DiceBot "Can I borrow a match?", however, that would set off red flags for the people listening, as it's clearly not a natural interaction between a human being and a glorified random number generator.

The specific challenge, then, was to build DiceBot's voice model in a way that gave us room to differentiate the inputs while not doing anything that a user wouldn't expect from a dice rolling bot. Given the extreme simplicity of our interface, we didn't have a lot of wiggle room in this regard.

The one thing we did have, though, was variations on our utterances. In order to account for the variability in the way people speak, a best practice is to define all of the common ways a user might speak a sentence, and treat them all as aliases for the same input. Our approach was to split these up and apply different semantic meanings to each of them.  




​

    
What we ended up with was a situation where "Roll a die", "Roll me a die", and "Roll a die for me" did not actually mean the same thing. The first was a fair die roll, the second was an upward-weighted die roll, and the third was downward weighted. We applied this same pattern to all variations of numbers and types of dice that we accepted. Once implemented, to the casual observer watching us use DiceBot we wouldn't have said anything that seemed strange or that would've tipped them off to the coded language we were using.

Spy status achieved. I assume my laser watch is on its way in the mail.


Technical limitations... or not...

One bright spot with this whole experiment was how well Alexa handled the extremely similar inputs. A major concern of ours was that Alexa would have a hard time differentiating between "Roll ten d-twelve" and "Roll me ten d-twelve". To Amazon's credit, though, we never once ran into this situation in our testing, despite the utterances being separated by only a single trivial word.

The only issue we did run into is that of how to deal with a user who just happens across the special utterances that confer an advantage. Because of the very natural sounding utterances we used, and compounded by the small set of possible utterances a user might randomly choose, there's a fairly high likelihood that an untrained user would accidentally pick the coded pattern. While DiceBot won't reveal its secrets in this situation, the result would be that the primary user's advantage over this untrained user (in whatever game they are playing) would dissolve.  We didn't spend a lot of time worrying about this concern, however, as an untrained user is just as likely to trigger a pattern that works against them as for them.

The Takeaway

So, has DiceBot solved natural language, and revolutionized the world of voice interface? Not at all. We absolutely recognize the gimmicky nature of DiceBot's coded language, and teaching a voice assistant to capture what is essentially a single point of body language (in this case, I like to imagine it like a wink) is only a tiny step on the road to separating what we say from what we mean.

It was a fun experiment, though, and one that theoretically has practical application (the idea of a tabletop DM doing an "open" roll but weighting the dice comes to mind). The real goal, though, was to get people (ourselves especially) thinking about how we push the boundaries of the current best practices to try and nudge the state of voice interaction forward.

We'd love to hear your thoughts on the topic in the comments below, or directly via email.
1 Comment
essay review online link
4/28/2019 04:13:34 pm

There are so many thing you are discovering in 3PO Labs, which is a good thing because that means that you don't stop in coming up with something that will be beneficial for you and for your followers. Though I am not really a techie person, and that is one of my least priorities, knowing it is still a huge honor for me. I am sure there are still a lot that 3PO Labs can offer. I am excited to see all of it!

Reply



Leave a Reply.

    Author

    We're 3PO-Labs.  We build things for fun and profit.  Right now we're super bullish on the rise of voice interfaces, and we hope to get you onboard.



    Archives

    May 2020
    March 2020
    November 2019
    October 2019
    May 2019
    October 2018
    August 2018
    February 2018
    November 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    February 2017
    January 2017
    December 2016
    October 2016
    September 2016
    August 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015

    RSS Feed

    Categories

    All
    ACCELERATOR
    ALEXA COMPANION APPS
    BOTS
    BUSINESS
    CERTIFICATION
    CHEATERS
    DEEPDIVE
    EASTER EGG
    ECHO
    FEATURE REQUESTS
    MONETIZATION
    RECAP
    RESPONDER
    TESTING
    TOOLS
    VUXcellence
    WALKTHROUGH

Proudly powered by Weebly
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact