3PO-LABS: ALEXA, ECHO AND VOICE INTERFACE
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact

3PO-Labs: Alexa, Echo and Voice Interface

Four years of certification, a retrospective

11/23/2019

1 Comment

 
It was around late November of 2015 when my original collaborator on CompliBot and InsultiBot and I started preparing our initial submission for certification. Last week, I submitted these same two skills again with a fun new feature I've been messing around with privately for a while. As you might've guessed, owing to the fact that I'm writing about it, it did not go well...


This is a long post, so lets just lay out now how this post is gonna go:
  1. I'm gonna tell you what happened with my latest submission, trying to not editorialize or insert too much of my opinion on the process.
  2. Then I'll talk about the mostly positive advances that we've seen in skill certification over the last 4 years.
  3. Finally, I'll fully explore that editorializing I refrained from in #1.
Sound good? Great, lets do it.

Enhanced Fallback and Unenhanced Feedback

So, last weekend I submitted a new feature for both CompliBot and InsultiBot. In this context it's not super important what the feature actually does, but if you're interested anyway it's something I'm working on a separate blog post about that, and I actually did a stream on Twitch last weekend testing the feature, which you can check out here if you'd like:
Check it out on Twitch... but ignore my sloppy sound issues...
The short of it is that the feature contained some new ways of helping users who were struggling with what context they were in. Usually I'd submit with just InsultiBot and use it as a canary (as CompliBot maintains much higher traffic), but the new ability to just certify and publish later on-demand meant I could push them to cert at the same time, and so I did. I wrote a big long set of testing notes (reminder: always put your testing notes in Google docs behind a clicktracked shortlink), and submitted to en-US. After about two days, I got a response saying CompliBot had passed cert on the first try. Hooray! A couple hours later, I was less pleased to find a rejection for InsultiBot in my inbox, despite the two sharing a codebase and having the same feature submitted at the same time. I was not ready for what I'd find inside the rejection, though...
Rejection notice saying the skill violates content guidelines... and nothing else
Let me just get on that real quick...
So, lets break down the layers of this:
  • InsultiBot does not contain profanity. I've actually toyed with the idea of adding it as a custom feature, but Amazon's policies on that are not flexible in a way that it would have allowed me to do it creatively.
  • InsultiBot does not contain explicit material. Unlike most other fortune cookie skills, I create all of my content, so I know what's in there, and I do not ever create any compliments or insults that are violent, sexually explicit, or that might expressly compel a person towards self-harm. 
  • All of InsultiBot's content is replicated between the card and the spoken text (CompliBot does have one homecard easteregg, though), so it is impossible that a piece of content violated the written rules but not the spoken rules, which are the same in terms of this type of content.
  • The new features I added were en-US only. The only changes to UK, Canada, and Australian locales was that I added a few additional utterances to the StopIntent. Here, though, I got a blanket failure with the same message across all four locales.
  • Most importantly, this failure DOES NOT SAY WHAT THE OFFENDING PIECE OF CONTENT IS. It just says that the skill violates the rules. It provides no recourse at all for resolving the problem.

Beyond the poor messaging, there are a couple other bits of context that made this frustrating. The first is that I have litigated the value and approach towards InsultiBot's content with the cert team many times in the past. Theoretically, this reviewer had access to my past submissions and notes by other reviewers; or more importantly, notes by the previous management of the team to whom I've been able to escalate similar issues in the past.

Second, as alluded to above, I always put my testing notes behind a clicktracked link, and this reviewer did not read the testing notes. Peep the stats (which I'm grabbing on November 23):
Picture
So, without guidance on another approach, I went ahead and resubmitted the skill exactly as is. I didn't change the backend code, didn't change the interfaces, and didn't touch the metadata or testing notes. On Friday morning, I got another email saying the skill had passed review¹. No reference was made to the previous issue, and I certainly did not "fix" it in the intervening period. Same inputs, different output.

¹: Those of you who are paying close attention may have noted that, since I didn't change anything at all, the bitly shortlink above has actually gone through TWO rounds of cert without being clicked a single time.

Alexa, ask CompliBot to compliment the cert team

So, here's the thing. Over the last 4 years, I've had several opportunities to meet many members of the cert group, and have at times even had a direct line to the managers of the COPS team in charge of skill certification. They're all extremely smart, hardworking people, and they have the absolutely unenviable job of being being the one to tell people "no" all the time. Nobody ever raises a fuss when cert happens to catch a legitimate VUX issue for them, only when they do something the submitter doesn't agree with. Further, they get blamed for a lot of things entirely out of their control. Their hands are bound by a lot of policy set by legal and other non-cert teams². And finally, they really have made a lot of progress in the last few years. Lets go through a few of the highlights:​
  • Review speed: When we first submitted CompliBot and InsultiBot, it took us over a month to get to release. That involved a call with Dave Isbitski, members of the developer advocacy team, and the aforementioned head of the skill certification team to get us across the finish line. Today, a single pass happens on the order of a couple days, and you can usually get from code complete to released in about a week of back-and-forth for a new skill or major enhancement.
  • Flexibility: Starting with that first cert pass, we were regularly finding ourselves fighting over UX issues where we believe our flow was preferable to the guidelines. Even after we came to a compromise on that initial call, for years we'd have to fight to keep our existing prompt/reprompt UX every single time we submitted. At least today the team seems to allow you the flexibility to assert that you know what you're doing when it comes to guidelines (but not policy, obviously). I believe the euphemism is "giving you enough rope..."
  • Preemptive checks: Today, the cert process provides a bunch of ways to avoid the "sit and wait for rejection" game by preemptively validating all of the easily-automated checks. Waiting a week to hear "your invocation name doesn't match your example phrases; rejected" was the worst, but new developers today need never know that pain.
  • Repro steps: Despite this article being about a very blatant counterexample, it bears noting that the team has become much better about providing actual reproduction steps when describing UX rather than policy failures. In looking back on some of my previous submissions of InsultiBot that were rejected, the team gave well written, meaningful steps that ended up becoming standard regression tests for me around things like getting my repeat intent into a broken state.
  • Just certify: And as noted, they very recently added something we've been waiting on for years - the ability to go through review independent of the process for taking an update live.
All of this is really to say that they have a hard job, that they are continuing to take meaningful steps, and that the process of certification today is nowhere near as objectively bad as it was in the beginning.​

²: As an aside, I had the chance to talk to one of these policy teams once, and it was the most frustrating conversation I've ever had with an Amazon group. Their positions are indefensible, but they know they set the rules the way they want and nobody can argue with them


Alexa, ask InsultiBot to insult the cert process...

Alright, it's hottake city, and I'm the mayor.

  1. This whole interaction is super dumb, most obviously in that they gave me feedback that it would have been literally impossible for me to resolve. Because they chose not to specifying the offending content there's really nothing I could do to resolve their feedback, short of packing up and saying I had a good run with InsultiBot and delisting it from the store.
  2. InsultiBot is a non-mature skill that is marked as mature and can only be enabled by theoretical adults, even though none of the content in it would be out of place in a typical middle school classroom. Meanwhile, these rules are not applied anywhere near consistently - a bunch of the InsultiBot copycats are not even mature tagged.
  3. WHY AM I STILL BEING ASKED TO WRITE TESTING NOTES THAT AREN'T BEING READ?
  4. Copy and pasting the same error across all four locales is incredibly lazy and very clearly shows that the reviewer was not actually paying attention to what my skill was doing in each locale. The hypothetical lesson here - you can probably get away with sneaking things in by submitting one thing in en-US, where they are testing, and submitting something altogether more flagrant in other locales, where they are not testing.
  5. It's absolutely absurd that that reviews are still standalone isolated events where you have no ability to have a threaded conversation with the reviewer, and the reviewer has no context about your previous submissions. This is not a recipe for success.
  6. Certification is a dice roll. This is a well known meme on Alexa Slack (alexaslack.com), where the random nature of cert is symbolized by a dice Slackmoji. If you disagree with the feedback you received, the number one thing you should always do is resubmit it untouched. This is bad for literally everyone. It wastes our time, it wastes the Alexa cert team's time, and it causes people who don't know about this trick (as an aside, I feel like a Facebook ad here: "With this one neat trick, Amazon hates him...") to just straightup quit the platform because they think they did something wrong and unresolvable³.
  7. If you are not a managed dev with an official Amazon bizdev contact, you probably aren't going to be successful today. This wasn't always the case, but it seems to get truer the longer the platform lives. You have to assume that there is no scenario in which a managed dev is getting a blanket rejection with a throwaway message like this. It just wouldn't happen. This isn't just about cert of course, the biggest thing is probably about promotion and visibility in the skill store, but it's to the point where I'm considering whether I (a person who is lucky to have a ton of connections at Amazon) should even bother building stuff if I don't have someone on the inside whose job it is to work on my behalf. It feels like the window for true "indie" devs is more or less closed.
  8. The biggest, and hottest of all my takes, is that these are the same issues I was writing about almost 4 years ago, and they haven't been solved despite being extremely high ROI.
This last point is the one that I think is the most important, and is certainly the most frustrating to me. I looked back at the very beginning of my blog post history, and two of them were relevant. The first is this post. The writeup itself is actually more about cert in general⁴, as it's a response to an article someone wrote on a now defunct site, which was itself a response to one of our threads on the also-defunct old Alexa forums. Digging into that writeup on the Internet Wayback Machine, though, is telling. My exact, 4 year old quote FTA:
"So, this is sort of a last ditch effort, but my team and I are about ready to give up on developing for Alexa, because of the terrible certification process that is unreasonably preventing our code from being released. Has anyone found a way to actively get in contact with the certification team?
Over the break, our two skills were rejected for the second time. The problem is that the only changes between our first rejection and second rejection were changes to implement the requirements of the first rejection. They rejected us for doing exactly what they said – and there seems to be absolutely no way that we can have a rational conversation with the certification team because they live behind some impenetrable iron curtain rather than being available for us to work with."

Additionally, check out this bit by the article's author, Lawrence Krubner, right at the end:
"However, the Amazon system is so broken that it potentially offers a fix for itself. When the Certification Team rejects your app, you don’t need to change the app. You do not need to respond to their requests. You do not need to make any of the changes that they demand. Since a new person reviews each submission, and since there is no limit on submissions, one way to get through the certification process is to simply roll the dice and spam the certification team. Submit an app 10 times, or 20 times, or 30 times. At some point you will probably get lucky, and someone will approve your app. And since re-submission only requires clicking a button, re-submitting is much easier than actually responding to any of the criticism that the Certification team gave you."

We were legit ready to walk over this. The problem wasn't that they wanted changes - we were ready to negotiate any specific points. The problem was that they gave us literally no path to resolving our problems. And they seemed completely unrepentant about it. Sound familiar?

Now, obviously we didn't quit at that point (well, two of my early collaborators had already quit by this point, and my other compatriot would quit later that year, but that's not the point...), and a lot of that has to do with the fact that after that post got traction on Hackernews, they reached out to us to get on a call. We wrote about that after the fact as well, and at that point had calmed down a bit based on the conversation we had with them. The post itself was mostly meant for reconciliation for putting them on blast. What we didn't really talk about were the message we gave the team. We had four points to make, and three of them are still relevant to this post, now, four years later: 1. Cert is a black box; 2. Cert is a dice roll 3. We were almost sure they weren't reading our notes, because they failed us for things we described in the notes. (The 4th issue was specifically about cert rule 4.1 being bad, and we've actually fought them about that one more than anything else over the years, but recently that seems to have finally stopped being a concern)

It's one thing for your process to have issues you don't know about, but another entirely to have issues you are aware of that you are choosing not to address.

³: This is not theoretical! Over the years I've gotten involved with several different developers who were on their way out the door because of arbitrary cert feedback. I helped some of them stay, but others I had to watch leave for no good reason.
⁴: In addition, this post actually talks about another topic that is entirely too relevant today - Amazon making backwards incompatible changes without notifying developers. This just happened last month and wrecked a bunch of skills' UX, as noted in this post.

Alexa, ask WrapUpBot to write a conclusion for this extremely long post...

Here's the thing. Those were heady, hyperbolic times, and in retrospect it was a nascent program on a platform nobody really understood yet, and there were bound to be hiccups. We were probably too critical of them at the time. But on the flip side, in the four years since then, I've been constantly - on a probably monthly basis - trying to escalate some of these issues. I've written many emails to be escalated, I've come in to their user research labs at Doppler and talked about these problems while they frantically took notes, I've had in-person meetings with folks in the Alexa development org to talk about these issues, and I even did a presentation for the COPS org about what it's like to be a third party developer for the platform and the struggles we face.

And fixing some of these problems is non-trivial (e.g. you can't expect humans to be perfectly consistent every time they do something, so of course cert will be a bit of a dice roll). But some of them are not hard. Some of them are really, really easy, and there's no reason they should not be solved today.

If there are testing notes, always read them. Boom, one gripe down.

If you have problems with a submission that the developer can fix, instead of closing the submission as rejected with a bunch of things to fix, offer the user a way to respond to your feedback, saying either "I fixed x by doing y" or "I believe x is appropriate in this context". Even if it's a different reviewer that picks it up, they can at least see the what the previous reviewer had said, and how the developer responded to that feedback. And this wouldn't require designing any novel flows or anything - it's a basic feature of any ticketing system. So, not quite as easy, but still, two down.

Finally, don't ever provide feedback that doesn't have repro steps or another way for the user to resolve the issue. This is something that drives me crazy on an additional level as someone who has a background in quality assurance - writing up issues without repro is the number one way to make your developers hate you.

These are plump low-hanging fruit that have have gone overripe waiting on the vine to be picked these last few years.

Let me know what you guys think.

P.S.: In Voicebot's podcast with Tom Hewitson this week, the discussion had a ton of overlap with what we've talked about here, and so if this topic interested you then the podcast is probably a great bit of extra credit. Tom has been publicly beating the drum lately on this and a bunch of the adjacent issues, and I feel like that needs to be acknowledged.


1 Comment
Jo Jaquinta link
12/8/2019 02:09:30 pm

I think I agree with almost everything you say here. A miracle! :-) The business practice we've evolved is to do everything possible to avoid certification at all costs. We create skills with the most open audio model we possibly can, and then modify the code behind it as we evolve the application. At most we re-certify once a quarter. But, often, we've been able to go a year between re-certifications, even with major feature roll outs. It shouldn't be like it. But if they won't change, we have to.

Reply



Leave a Reply.

    Author

    We're 3PO-Labs.  We build things for fun and profit.  Right now we're super bullish on the rise of voice interfaces, and we hope to get you onboard.



    Archives

    November 2019
    October 2019
    May 2019
    October 2018
    August 2018
    February 2018
    November 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    February 2017
    January 2017
    December 2016
    October 2016
    September 2016
    August 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015

    RSS Feed

    Categories

    All
    ACCELERATOR
    ALEXA COMPANION APPS
    BOTS
    BUSINESS
    CERTIFICATION
    CHEATERS
    DEEPDIVE
    EASTER EGG
    ECHO
    FEATURE REQUESTS
    MONETIZATION
    RECAP
    RESPONDER
    TESTING
    TOOLS
    VUXcellence
    WALKTHROUGH

Proudly powered by Weebly
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact