Something like a problem description
Our tool looks to solve the problem of doing automated testing of the Alexa voice/intent model. There are a lot of reasons why a person might want to do this sort of testing, but that's an altogether different post for the future. For now, just go with the assumption that as a skill developer you do want to test the intent model.
Going a little deeper into what I mean by "test the intent model", I like to think of the Alexa end-to-end scenario in three discretely testable phases or stages:
- Given some audio input from a user, what kind of Alexa skill request does Alexa render?
- Given an Alexa skill request, how does a given Lambda/service act & respond?
- Given a response from a Lambda/service, how is that output rendered by Alexa?
Now, problem #2 has been addressed by a ton of different tools that are much better than anything I'd have the patience to put together. The official service simulator, and Bespoken's tools are both really good for helping you delve into #2.
Stage #3 is actually what we were addressing way back in the day when we built ASKResponder. Since then Amazon has released a couple tools (like the voice simulator) that make ASKResponder a bit less useful than it was on day 1, but it's still a handy little utility.
That leaves the first stage, which has been my white whale for some time. The tool that I'm showing you (in limited, proof-of-concept form) today allows you to finally address this aspect of the Alexa lifecycle.
A bit of level-setting
- Open up the PoC for people to play around with (Check. Whoo!)
- Figure out how to meaningfully document what this process is all about, why it's worth doing, and how it all works.
- Meanwhile, address some of the major technical shortcomings described below, hopefully with some suggestions from the dev community.
- Open up the generic version of this UI to people to try on their own skills.
- Open up the REST API (and maybe provide a Java client) to let people actually do test automation on their own skills.
So, considered linearly (and knowing that it's taken me a year to get to this point), that seems like a long timeline. Luckily, most of this work has happened in parallel (I already have a 50-case test suite for CompliBot via the REST API, for example), so the generic UI and REST API are mostly done, just waiting on the solutions from the third bullet point.
So...PoC?
There are some serious shortcomings which I'll describe in a moment, but if you're antsy to start pushing buttons, here's the link:
http://utility.3po-labs.com:12443/NeverwinterCityGuide.html
Caveats on caveats on caveats
- Right now, this only works for brand new sessions. We're not messing with session management. We'll get to that eventually, but one-shot invocations are the ones that cause the most trouble today, so that's what we're addressing first.
- There is currently a strongly recommended 30-second wait between making requests. This has a lot to do with the aforementioned sessions. Basically, Amazon provides us no way to kill an AVS session programmatically, so if you don't wait long enough, concurrent requests will be considered to be part of the same session (and you'll get wonky results, as you might expect).
- Multi-threading is a huge headache. For any given Alexa skill request there's no correlation id that lets you tie it back to the AVS request that generated it. As such, we're basically doing our best to guess which skill request matches which AVS request using a composite key of application id and skill user id (which you see in the fields at the top of the page). One additional thing to note is that for the PoC specifically, if more than one of you are using the page at the same time, you'll clobber each other's requests. We think the odds of this happening are pretty low, though.
- Sometimes timeouts happen. There are a lot of moving parts here, and the 6-second window Alexa skills have to return is occasionally not long enough.
- As a result of the above 4, there are sometimes race conditions that make the page start doing funny things like returning previous requests. IF THIS HAPPENS TO YOU, PLEASE CONTACT ERIC.
Problem solving
- If you can figure out a good way to programmatically kill an AVS session so that subsequent requests will have tabula rasa, that would be great.
- If you can figure out any sort of way to pass some sort of beacon from AVS to ASK, such that we can deterministically tie the two requests together, we'd be forever in your debt.
- The other thing that we'd like is additional data for the PoC. The male and female voices are both from people in Seattle, which isn't known for speaking English with any sort of an accent. If you have an interesting regional English accent (Southern US, for example), or if English is not your primary language, we'd love to have you record a bit of audio for us to use as additional inputs for testing our model.
- A name! Right now the executable for this service is literally called "newtool-service.jar". If you have ideas for what we can call this thing, we'd love to hear it.