Glossary
- Testability: Testability is the degree to which a product can be reasonably tested. Within the scope of this article, we specifically care about the testability of how our skills are integrated with Alexa. There are myriad different aspects of testability that you might consider, and this is a topic that is consistently talked about by some of the brightest minds (James Bach, Cem Kaner, Lisa Crispin, et al) in software quality. Our approach today will be a simplistic one - to say that more testability is better, and to propose specific actions to improve it.
- Black Box: When we say that a system is a black box, we mean that its inner workings are not visible to us as consumers.
- Synthetic Monitoring: Synthetic monitoring is the idea of observing (on some recurring basis) the state of a system by pretending to be a normal user and interacting with that system. This is distinct from a standard monitoring approach, which uses custom-built metrics or utilities to convey system state.
The Problem Space
Example 1: Something broke, my users say
A few examples:
- That time that redirect_uri for OAuth was changed
- Last week, when a new field was added silently (admittedly, we missed something on our side, but still...)
- Skill outages, like the ones reported on by TsaTsaTzu on March 22, or the more widespread one that is occuring right now, March 30, as I write this.
Now, as anyone who has worked in software knows, these things happen. I'm not here to pass judgement on their uptime or the quality of what they release. Just to paint a picture of the current state.
Example 2: The bait-n-switch
Example 3: We're super "Agile" as far as "Waterfall" goes
We get it, being a skill developer sucks...
- A platform where we could do end-to-end testing in a way that was recurring - where a synthetic transaction would occur every five or ten minutes to confirm that everything was still hunky-dory. No more finding out from users that your skill is now mishearing the name for the rapper "Coolio" as "Culo" - the Spanish word for "ass".
- An API that allows a user to analyze the way some voice clips are being resolved into intents to help figure out why their user logs show them getting in to a weird state where they just keep asking for help over and over. (This is a real, mysterious recurring problem for our team).
- A system that allows a dev team to make a minor tweak based on certification feedback, and then kick off an entire suite of automated voice tests to confirm that the certification suggestion didn't degrade performance of their skill's voice model.
- To build on the previous point, imagine a world where test automation for Alexa was so straightforward that the dev community could work together to build cases for people with different accents, genders, etc. Instead of one dev repeatedly testing a skill manually with their own voice, we could build out a sharing economy where a quid-pro-quo could get you audio files of your test cases read by someone else, in exchange for you reading their utterances.
Moar solutions plz...
- New API for developers to use
- Accepts an audio file
- Uses skill's live model, just as a user's request would ²
- Returns to the user (as JSON) a payload describing what the request resolved to, and what the theoretical skill request would've looked like.
- Does NOT actually call out to the skill's Lambda or webservice
- Can be secured by product id, login with amazon, or any other token system
- Can be rate limited to avoid abuse.
What do you guys think? Let us know in the comments below, in the communities where we link this post, or feel free to contact us directly.