- I'm gonna tell you what happened with my latest submission, trying to not editorialize or insert too much of my opinion on the process.
- Then I'll talk about the mostly positive advances that we've seen in skill certification over the last 4 years.
- Finally, I'll fully explore that editorializing I refrained from in #1.
Enhanced Fallback and Unenhanced Feedback
- InsultiBot does not contain profanity. I've actually toyed with the idea of adding it as a custom feature, but Amazon's policies on that are not flexible in a way that it would have allowed me to do it creatively.
- InsultiBot does not contain explicit material. Unlike most other fortune cookie skills, I create all of my content, so I know what's in there, and I do not ever create any compliments or insults that are violent, sexually explicit, or that might expressly compel a person towards self-harm.
- All of InsultiBot's content is replicated between the card and the spoken text (CompliBot does have one homecard easteregg, though), so it is impossible that a piece of content violated the written rules but not the spoken rules, which are the same in terms of this type of content.
- The new features I added were en-US only. The only changes to UK, Canada, and Australian locales was that I added a few additional utterances to the StopIntent. Here, though, I got a blanket failure with the same message across all four locales.
- Most importantly, this failure DOES NOT SAY WHAT THE OFFENDING PIECE OF CONTENT IS. It just says that the skill violates the rules. It provides no recourse at all for resolving the problem.
Beyond the poor messaging, there are a couple other bits of context that made this frustrating. The first is that I have litigated the value and approach towards InsultiBot's content with the cert team many times in the past. Theoretically, this reviewer had access to my past submissions and notes by other reviewers; or more importantly, notes by the previous management of the team to whom I've been able to escalate similar issues in the past.
Second, as alluded to above, I always put my testing notes behind a clicktracked link, and this reviewer did not read the testing notes. Peep the stats (which I'm grabbing on November 23):
¹: Those of you who are paying close attention may have noted that, since I didn't change anything at all, the bitly shortlink above has actually gone through TWO rounds of cert without being clicked a single time.
Alexa, ask CompliBot to compliment the cert team
- Review speed: When we first submitted CompliBot and InsultiBot, it took us over a month to get to release. That involved a call with Dave Isbitski, members of the developer advocacy team, and the aforementioned head of the skill certification team to get us across the finish line. Today, a single pass happens on the order of a couple days, and you can usually get from code complete to released in about a week of back-and-forth for a new skill or major enhancement.
- Flexibility: Starting with that first cert pass, we were regularly finding ourselves fighting over UX issues where we believe our flow was preferable to the guidelines. Even after we came to a compromise on that initial call, for years we'd have to fight to keep our existing prompt/reprompt UX every single time we submitted. At least today the team seems to allow you the flexibility to assert that you know what you're doing when it comes to guidelines (but not policy, obviously). I believe the euphemism is "giving you enough rope..."
- Preemptive checks: Today, the cert process provides a bunch of ways to avoid the "sit and wait for rejection" game by preemptively validating all of the easily-automated checks. Waiting a week to hear "your invocation name doesn't match your example phrases; rejected" was the worst, but new developers today need never know that pain.
- Repro steps: Despite this article being about a very blatant counterexample, it bears noting that the team has become much better about providing actual reproduction steps when describing UX rather than policy failures. In looking back on some of my previous submissions of InsultiBot that were rejected, the team gave well written, meaningful steps that ended up becoming standard regression tests for me around things like getting my repeat intent into a broken state.
- Just certify: And as noted, they very recently added something we've been waiting on for years - the ability to go through review independent of the process for taking an update live.
²: As an aside, I had the chance to talk to one of these policy teams once, and it was the most frustrating conversation I've ever had with an Amazon group. Their positions are indefensible, but they know they set the rules the way they want and nobody can argue with them
Alexa, ask InsultiBot to insult the cert process...
- This whole interaction is super dumb, most obviously in that they gave me feedback that it would have been literally impossible for me to resolve. Because they chose not to specifying the offending content there's really nothing I could do to resolve their feedback, short of packing up and saying I had a good run with InsultiBot and delisting it from the store.
- InsultiBot is a non-mature skill that is marked as mature and can only be enabled by theoretical adults, even though none of the content in it would be out of place in a typical middle school classroom. Meanwhile, these rules are not applied anywhere near consistently - a bunch of the InsultiBot copycats are not even mature tagged.
- WHY AM I STILL BEING ASKED TO WRITE TESTING NOTES THAT AREN'T BEING READ?
- Copy and pasting the same error across all four locales is incredibly lazy and very clearly shows that the reviewer was not actually paying attention to what my skill was doing in each locale. The hypothetical lesson here - you can probably get away with sneaking things in by submitting one thing in en-US, where they are testing, and submitting something altogether more flagrant in other locales, where they are not testing.
- It's absolutely absurd that that reviews are still standalone isolated events where you have no ability to have a threaded conversation with the reviewer, and the reviewer has no context about your previous submissions. This is not a recipe for success.
- Certification is a dice roll. This is a well known meme on Alexa Slack (alexaslack.com), where the random nature of cert is symbolized by a dice Slackmoji. If you disagree with the feedback you received, the number one thing you should always do is resubmit it untouched. This is bad for literally everyone. It wastes our time, it wastes the Alexa cert team's time, and it causes people who don't know about this trick (as an aside, I feel like a Facebook ad here: "With this one neat trick, Amazon hates him...") to just straightup quit the platform because they think they did something wrong and unresolvable³.
- If you are not a managed dev with an official Amazon bizdev contact, you probably aren't going to be successful today. This wasn't always the case, but it seems to get truer the longer the platform lives. You have to assume that there is no scenario in which a managed dev is getting a blanket rejection with a throwaway message like this. It just wouldn't happen. This isn't just about cert of course, the biggest thing is probably about promotion and visibility in the skill store, but it's to the point where I'm considering whether I (a person who is lucky to have a ton of connections at Amazon) should even bother building stuff if I don't have someone on the inside whose job it is to work on my behalf. It feels like the window for true "indie" devs is more or less closed.
- The biggest, and hottest of all my takes, is that these are the same issues I was writing about almost 4 years ago, and they haven't been solved despite being extremely high ROI.
"So, this is sort of a last ditch effort, but my team and I are about ready to give up on developing for Alexa, because of the terrible certification process that is unreasonably preventing our code from being released. Has anyone found a way to actively get in contact with the certification team?
Over the break, our two skills were rejected for the second time. The problem is that the only changes between our first rejection and second rejection were changes to implement the requirements of the first rejection. They rejected us for doing exactly what they said – and there seems to be absolutely no way that we can have a rational conversation with the certification team because they live behind some impenetrable iron curtain rather than being available for us to work with."
Additionally, check out this bit by the article's author, Lawrence Krubner, right at the end:
"However, the Amazon system is so broken that it potentially offers a fix for itself. When the Certification Team rejects your app, you don’t need to change the app. You do not need to respond to their requests. You do not need to make any of the changes that they demand. Since a new person reviews each submission, and since there is no limit on submissions, one way to get through the certification process is to simply roll the dice and spam the certification team. Submit an app 10 times, or 20 times, or 30 times. At some point you will probably get lucky, and someone will approve your app. And since re-submission only requires clicking a button, re-submitting is much easier than actually responding to any of the criticism that the Certification team gave you."
We were legit ready to walk over this. The problem wasn't that they wanted changes - we were ready to negotiate any specific points. The problem was that they gave us literally no path to resolving our problems. And they seemed completely unrepentant about it. Sound familiar?
Now, obviously we didn't quit at that point (well, two of my early collaborators had already quit by this point, and my other compatriot would quit later that year, but that's not the point...), and a lot of that has to do with the fact that after that post got traction on Hackernews, they reached out to us to get on a call. We wrote about that after the fact as well, and at that point had calmed down a bit based on the conversation we had with them. The post itself was mostly meant for reconciliation for putting them on blast. What we didn't really talk about were the message we gave the team. We had four points to make, and three of them are still relevant to this post, now, four years later: 1. Cert is a black box; 2. Cert is a dice roll 3. We were almost sure they weren't reading our notes, because they failed us for things we described in the notes. (The 4th issue was specifically about cert rule 4.1 being bad, and we've actually fought them about that one more than anything else over the years, but recently that seems to have finally stopped being a concern)
It's one thing for your process to have issues you don't know about, but another entirely to have issues you are aware of that you are choosing not to address.
³: This is not theoretical! Over the years I've gotten involved with several different developers who were on their way out the door because of arbitrary cert feedback. I helped some of them stay, but others I had to watch leave for no good reason.
⁴: In addition, this post actually talks about another topic that is entirely too relevant today - Amazon making backwards incompatible changes without notifying developers. This just happened last month and wrecked a bunch of skills' UX, as noted in this post.
Alexa, ask WrapUpBot to write a conclusion for this extremely long post...
And fixing some of these problems is non-trivial (e.g. you can't expect humans to be perfectly consistent every time they do something, so of course cert will be a bit of a dice roll). But some of them are not hard. Some of them are really, really easy, and there's no reason they should not be solved today.
If there are testing notes, always read them. Boom, one gripe down.
If you have problems with a submission that the developer can fix, instead of closing the submission as rejected with a bunch of things to fix, offer the user a way to respond to your feedback, saying either "I fixed x by doing y" or "I believe x is appropriate in this context". Even if it's a different reviewer that picks it up, they can at least see the what the previous reviewer had said, and how the developer responded to that feedback. And this wouldn't require designing any novel flows or anything - it's a basic feature of any ticketing system. So, not quite as easy, but still, two down.
Finally, don't ever provide feedback that doesn't have repro steps or another way for the user to resolve the issue. This is something that drives me crazy on an additional level as someone who has a background in quality assurance - writing up issues without repro is the number one way to make your developers hate you.
These are plump low-hanging fruit that have have gone overripe waiting on the vine to be picked these last few years.
Let me know what you guys think.
P.S.: In Voicebot's podcast with Tom Hewitson this week, the discussion had a ton of overlap with what we've talked about here, and so if this topic interested you then the podcast is probably a great bit of extra credit. Tom has been publicly beating the drum lately on this and a bunch of the adjacent issues, and I feel like that needs to be acknowledged.